From patchwork Wed May 30 12:31:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 137258 Delivered-To: patch@linaro.org Received: by 2002:a2e:9706:0:0:0:0:0 with SMTP id r6-v6csp5261394lji; Wed, 30 May 2018 05:32:08 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJlvj8yo9FmMNDI1E7v4yTsfxsUmdhF58cPRwmoiV2Qz7V4lFPZnRMNJefeEivkJWfT4ofI X-Received: by 2002:a63:a401:: with SMTP id c1-v6mr2106942pgf.110.1527683528284; Wed, 30 May 2018 05:32:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527683528; cv=none; d=google.com; s=arc-20160816; b=ci0gXAVqK4EtRyhsbLiTya+f6Vt8tWkcvwzalrFe+G4tqp8bXOOUbTLVaQ+8YwjKRD DSH7UvQEv/r5GTQdUdLNjk7SPoyP2vJaFFAG4mOKu0vtZfZY4rgQhISmR8SxWeTZ/htC 3g1Fgzjzy3NbAZXZgRBe4Eb4+CFqkEfJZWqqQHgma6q6DkUvSCXmebHBYDzNIR8EgR0I arQFGg/dI54ciYrTTvlOXXnHMM8K1FX8Hep2b2oJMOLKnWGsGFAnq+G2aZJIEdib9tqZ F6QpxmUC+y5il6/vIy2uAuhI3fCtI8sV0+ZBMj6BAPETBu3N5TwtUyhVBMnAGS5MCf8n VKeQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:user-agent:message-id:in-reply-to:date:references :subject:cc:mail-followup-to:to:from:delivered-to:sender:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :mailing-list:dkim-signature:domainkey-signature :arc-authentication-results; bh=L2YquQRXhQWHZFTKvQ+8UWJ1yRQ2LpYp0UW4RaJ6EkI=; b=06WdX1DMv9Fng/eHEzAIoHCL1R435eaNaNPUqjVo08QR1RBRDVKZHAu8Vb1ox78q5R fVftgL3DurTUFGffU9lK8B6TBwh/NQzVD4CPrP9EadoXRFzEswjN0kyGeZ9v4JHV4ipn a5AFY2tnV83xGoI30r7g5SRP66hIHxasmMnZn6l3lNAwBwck2p9qUn3K8aeOQSS7ynGK K4bPjSCD0qm54cHa4LszlgfYKuCsU7+GuRFAcW749+cIwrKrSSSFVB8r5zn+M0g64i1F jddS5GwQoCk1E1tP4gSgDkwyFiBxwhsq+KXEVVngkA+wyQZ59P1qT0SLGOEcfPuo9ZyU 4E7w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=hsFdDvjm; spf=pass (google.com: domain of gcc-patches-return-478751-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-478751-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id l10-v6si11792387pga.38.2018.05.30.05.32.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 30 May 2018 05:32:08 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-478751-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=hsFdDvjm; spf=pass (google.com: domain of gcc-patches-return-478751-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-478751-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:references:date:in-reply-to:message-id :mime-version:content-type; q=dns; s=default; b=tqleY7TQVybf7rsG uGpmKjzTMYgT6bYA83uGtDkuz5nsmh98iZZuG9mrR7HmRPeyE4aXd97VfpZJQK6r VraX1vDpDao13uVGCEqP8Cwv2jxd7cebLMKAwEfELFM+RF8NYUWgFWdCmviA5gip 3GSee5sU4XwvNnajc1kOCUl9PUw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:references:date:in-reply-to:message-id :mime-version:content-type; s=default; bh=x0eBjbnbwVbIwx18gMs9CP wf8NA=; b=hsFdDvjmjVTUCipTMfS2MDPb6K4rawKx/OIOtNpjP+t+HRXz9BMngu YDj/cb1n/ygOD73kq6laMX5y9ip1Aie/5X2N7GZJTEAwgmdZbyUsOYn/hUGuqoNf DT83Yuib9hLI7ozFOf5HeVZTYSHBRiUrzy3TM0RehmAX9saNt9eHw= Received: (qmail 15585 invoked by alias); 30 May 2018 12:31:43 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 15400 invoked by uid 89); 30 May 2018 12:31:20 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=Inheritance, 1060 X-HELO: mail-wm0-f46.google.com Received: from mail-wm0-f46.google.com (HELO mail-wm0-f46.google.com) (74.125.82.46) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 30 May 2018 12:31:18 +0000 Received: by mail-wm0-f46.google.com with SMTP id q4-v6so1583006wmq.1 for ; Wed, 30 May 2018 05:31:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:cc:subject:references :date:in-reply-to:message-id:user-agent:mime-version; bh=L2YquQRXhQWHZFTKvQ+8UWJ1yRQ2LpYp0UW4RaJ6EkI=; b=gHJlQ7DwwvB8tklUrKvS+ZnOX3Dc6OIdkX254jZDfIKfwFi53soAnycZ8Qp+88vC2N MbbMtrYzY8p6jMyHJTeOfG0JIYPZBuv1kP5mCT/LeYt28U/Zmabzz9fQpoW0edgSGhyF 0CMcsBZe8w4beaq5dbWqkDtNP6OPt967kfykgyH4fhyrWX49OE3VzLs5fs7XkEV5L5by jrgGVoYPtHSgKSXWpHbWL+E+RHx1y0g/yrsdCvJX3kw0qXl3Wg2y4x/oF6gPonsBIPby s6cMhB9JV2FQjH4/BtcxSM5cfKv4L5w3YbEomGr0AwQ28O2D6IvtMxCxf4TBmIGiisUQ gTxQ== X-Gm-Message-State: ALKqPweaV5Hz1CNsFspHp5YdBdEoUS7c+LzFi0tgikjWzKFUmIEkTwRS CqyKWCnJggko49vPbwwGdkX8vw== X-Received: by 2002:a1c:b70b:: with SMTP id h11-v6mr1329037wmf.1.1527683475712; Wed, 30 May 2018 05:31:15 -0700 (PDT) Received: from localhost ([217.140.96.141]) by smtp.gmail.com with ESMTPSA id o53-v6sm55723875wrc.96.2018.05.30.05.31.14 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 30 May 2018 05:31:14 -0700 (PDT) From: Richard Sandiford To: Kyrill Tkachov Mail-Followup-To: Kyrill Tkachov , "gcc-patches\@gcc.gnu.org" , Marcus Shawcroft , "Richard Earnshaw \(lists\)" , James Greenhalgh , vmakarov@redhat.com, richard.sandiford@linaro.org Cc: "gcc-patches\@gcc.gnu.org" , Marcus Shawcroft , "Richard Earnshaw \(lists\)" , James Greenhalgh , vmakarov@redhat.com Subject: Tighten LRA test for reloading the inner reg of a paradoxical subreg References: <5B0D5460.7030603@foss.arm.com> <87zi0ipmfv.fsf@e105548-lin.cambridge.arm.com> <5B0D668F.7080503@foss.arm.com> Date: Wed, 30 May 2018 13:31:13 +0100 In-Reply-To: <5B0D668F.7080503@foss.arm.com> (Kyrill Tkachov's message of "Tue, 29 May 2018 15:41:19 +0100") Message-ID: <877enl9vfy.fsf_-_@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Kyrill Tkachov writes: > Hi Richard, > > On 29/05/18 15:26, Richard Sandiford wrote: >> Kyrill Tkachov writes: >>> Hi all, >>> >>> The recent changes to aarch64_expand_vector_init cause an ICE in the >>> attached testcase. The register allocator "ICEs with Max. number of >>> generated reload insns per insn is achieved (90)" >>> >>> That is because aarch64_expand_vector_init creates a paradoxical >>> subreg to move a DImode value >>> into a V2DI vector: >>> (insn 74 72 76 8 (set (reg:V2DI 287 [ _166 ]) >>> (subreg:V2DI (reg/v/f:DI 112 [ d ]) 0)) 1050 >>> {*aarch64_simd_movv2di} >>> >>> This is done because we want to express that the whole of the V2DI >>> vector will be written so that init-regs doesn't try to >>> zero-initialise it before we overwrite each lane individually anyway. >>> >>> This can go bad for because if the DImode value is allocated in, say, >>> x30: the last register in that register class, the V2DI subreg of that >>> isn't valid or meaningful and that seems to cause the trouble. >>> >>> It's kinda hard to know what the right solution for this is. >>> We could emit a duplicate of the value into all the lanes of the >>> vector, but we have tests that test against that >>> (we're trying to avoid unnecessary duplicates) >>> >>> What this patch does is it defines a pattern for moving a scalar into >>> lane 0 of a vector using a simple FMOV or LDR and represents that as a >>> merging with a vector of zeroes. That way, the instruction represents >>> a full write of the destination vector but doesn't "read" more bits >>> from the source than necessary. The zeroing effect is also a more >>> accurate representation of the semantics of FMOV. >> This feels like a hack. Either the paradoxical subreg of the pseudo >> is invalid for some reason (in which case we should ICE when it's formed, >> not just in the case of x30 being allocated) or the subreg is valid, >> in which case the RA should handle it correctly (and the backend should >> give it the information it needs to do that). >> >> I could see the argument for ignoring the problem for expediency if the >> patch was a clean-up in its own right, but I think it's wrong to add so >> much code to paper over a bug. > > I see what you mean. Do you have any thoughts on where in RA we'd go > about fixing this? Since I don't know my way around RA I tried in the > place I was most comfortable changing :) Ah, but everyone who's patched the RA had to patch it for the first time. :-) And it's not that scary. But anyway... The original insn was: (insn 74 72 76 8 (set (reg:V2DI 287 [ _166 ]) (subreg:V2DI (reg/v/f:DI 112 [ d ]) 0)) 1060 {*aarch64_simd_movv2di} (nil)) which IRA converted to: (insn 74 72 580 8 (set (reg:V2DI 287 [ _166 ]) (subreg:V2DI (reg/v/f:DI 517 [orig:112 d ] [112]) 0)) 1060 {*aarch64_simd_movv2di} (nil)) after creating loop allocnos. It happens that the ALLOCNO_WMODEs for both 112 and 517 were not set to V2DI due to another bug that I'll post a separate patch for, but we nevertheless got a valid allocation of register 1. LRA's first try at constraining the instruction gave: Choosing alt 5 in insn 74: (0) ?w (1) r {*aarch64_simd_movv2di} at which point all was good. But LRA later decided it needed to spill r517: Spill r517 after risky transformations so the next constraint attempt gave: Choosing alt 0 in insn 74: (0) =w (1) m {*aarch64_simd_movv2di} which was still good. Then during inheritance we had: Creating newreg=672 from oldreg=517, assigning class GENERAL_REGS to inheritance r672 Original reg change 517->672 (bb8): 74: r287:V2DI=r672:DI#0 Add inheritance<-original before: 939: r672:DI=r517:DI Inheritance reuse change 517->672 (bb8): 620: r572:DI=r672:DI REG_DEAD r672:DI Use smallest class of POINTER_REGS and GENERAL_REGS Creating newreg=673 from oldreg=517, assigning class POINTER_REGS to inheritance r673 Original reg change 517->673 (bb8): 936: r669:DI=r673:DI Add inheritance<-original before: 940: r673:DI=r517:DI ("Use smallest class of POINTER_REGS and GENERAL_REGS" ought to give GENERAL_REGS. That might be a missed optimisation, and probably due to both classes having the same number of allocatable registers. I'll look at that as a follow-on.) Thus LRA created two inheritance registers for r517, one (r673) that included the unallocatable x31 and another (r672) that didn't. The r672 references included the paradoxical subreg in insn 74 but the r673 ones didn't. LRA then allocated x30 to r673, which was a valid choice. Later LRA decided to "undo" the inheritance for insn 620, but because of the double inheritance, it got confused as to what the original situation was, and made insn 74 use the other inheritance register instead of r517: ********** Undoing inheritance #2: ********** Inherit 11 out of 12 (91.67%) Insn after restoring regs: 620: r572:DI=r517:DI REG_DEAD r517:DI Change reload insn: 74: r287:V2DI=r673:DI#0 <------------------- Insn after restoring regs: 939: r517:DI=r673:DI REG_DEAD r673:DI This might be a bug in itself: we should probably look through sets of other inheritance pseudos to find the "real" origin. Either way, at this point we had a situation in which r673 was used in an insn whose subreg was larger than the biggest_mode that r673 had when it was allocated. While x30 was valid for the original biggest_mode, it wasn't valid for this subreg use. The next attempt to constrain insn 74 was: Choosing alt 5 in insn 74: (0) ?w (1) r {*aarch64_simd_movv2di} Creating newreg=684, assigning class GENERAL_REGS to r684 74: r287:V2DI=r684:V2DI Inserting insn reload before: 951: r684:V2DI=r673:DI#0 where LRA reloaded the SUBREG rather than the SUBREG_REG. And it then cycled trying the same thing when reloading the reload (and the reload of the reload, etc.). What it should be doing here is reloading the SUBREG_REG instead. There's already code to cope with this case when the paradoxical subreg falls outside the class (which isn't true here, since r673 is POINTER_REGS and POINTER_REGS includes x31). But I think we should also test whether LRA is entitled to allocate the spanned registers. Not doing that seems like a bug regardless of the above missed optimisation and the mix-up undoing inheritance. Tested so far on aarch64-linux-gnu. I'll try aarch64_be-elf and x86_64-linux-gnu too. Thanks, Richard 2018-05-30 Richard Sandiford gcc/ * lra-constraints.c (simplify_operand_subreg): In the paradoxical case, check whether the outer register overlaps an unallocatable register, not just whether it fits the required class. gcc/testsuite/ * g++.dg/torture/aarch64-vect-init-1.C: New test. Index: gcc/lra-constraints.c =================================================================== --- gcc/lra-constraints.c 2018-03-30 12:28:37.301927949 +0100 +++ gcc/lra-constraints.c 2018-05-30 12:07:18.220421910 +0100 @@ -1722,7 +1722,13 @@ simplify_operand_subreg (int nop, machin (subreg:TI (reg:TI 180 [orig:107 __comp ] [107]) 0)) {*movti_internal_rex64} Two reload hard registers will be allocated to reg180 to save TImode data - in LRA_assign. */ + in LRA_assign. + + For LRA pseudos this should normally be handled by the biggest_mode + mechanism. However, it's possible for new uses of an LRA pseudo + to be introduced after we've allocated it, such as when undoing + inheritance, and the allocated register might not then be appropriate + for the new uses. */ else if (REG_P (reg) && REGNO (reg) >= FIRST_PSEUDO_REGISTER && (hard_regno = lra_get_regno_hard_regno (REGNO (reg))) >= 0 @@ -1731,7 +1737,9 @@ simplify_operand_subreg (int nop, machin && (regclass = lra_get_allocno_class (REGNO (reg))) && (type != OP_IN || !in_hard_reg_set_p (reg_class_contents[regclass], - mode, hard_regno))) + mode, hard_regno) + || overlaps_hard_reg_set_p (lra_no_alloc_regs, + mode, hard_regno))) { /* The class will be defined later in curr_insn_transform. */ enum reg_class rclass Index: gcc/testsuite/g++.dg/torture/aarch64-vect-init-1.C =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/g++.dg/torture/aarch64-vect-init-1.C 2018-05-30 12:07:18.220421910 +0100 @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-mcpu=cortex-a72" { target aarch64*-*-* } } */ + +class A { +public: + unsigned char *fn1(); + int fn2(); +}; + +class B { + A fld1; + int fld2; + void fn3(); + unsigned char fld3; +}; + +int a; + +void +B::fn3() { + int b = fld1.fn2() / 8; + unsigned char *c = fld1.fn1(), *d = &fld3, *e = c; + for (; a < fld2;) + for (int j = 0; j < b; j++) + *d++ = e[j]; + for (; 0 < fld2;) + for (int j = 0; j < b; j++) + e[j] = *d++; + for (; fld2;) + ; +}