From patchwork Thu May 26 13:35:38 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 1645 Return-Path: Delivered-To: unknown Received: from imap.gmail.com (74.125.159.109) by localhost6.localdomain6 with IMAP4-SSL; 08 Jun 2011 14:53:46 -0000 Delivered-To: patches@linaro.org Received: by 10.52.181.230 with SMTP id dz6cs23432vdc; Thu, 26 May 2011 06:35:48 -0700 (PDT) Received: by 10.43.62.134 with SMTP id xa6mr1493902icb.369.1306416947605; Thu, 26 May 2011 06:35:47 -0700 (PDT) Received: from mail.codesourcery.com (mail.codesourcery.com [38.113.113.100]) by mx.google.com with ESMTPS id r10si7354765icw.8.2011.05.26.06.35.47 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 26 May 2011 06:35:47 -0700 (PDT) Received-SPF: pass (google.com: domain of ams@codesourcery.com designates 38.113.113.100 as permitted sender) client-ip=38.113.113.100; Authentication-Results: mx.google.com; spf=pass (google.com: domain of ams@codesourcery.com designates 38.113.113.100 as permitted sender) smtp.mail=ams@codesourcery.com Received: (qmail 21155 invoked from network); 26 May 2011 13:35:45 -0000 Received: from unknown (HELO ?192.168.0.100?) (ams@127.0.0.2) by mail.codesourcery.com with ESMTPA; 26 May 2011 13:35:45 -0000 Message-ID: <4DDE572A.6060305@codesourcery.com> Date: Thu, 26 May 2011 14:35:38 +0100 From: Andrew Stubbs Organization: CodeSourcery User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110424 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: "Joseph S. Myers" CC: Bernd Schmidt , Richard Earnshaw , gcc-patches@gcc.gnu.org, patches@linaro.org Subject: Re: [patch][simplify-rtx] Fix 16-bit -> 64-bit multiply and accumulate References: <4D01018F.3020108@codesourcery.com> <1296153619.9738.16.camel@e102346-lin.cambridge.arm.com> <4D42955C.1060707@codesourcery.com> <1296223929.9738.30.camel@e102346-lin.cambridge.arm.com> <4D42DD32.7020404@codesourcery.com> <1296228038.9738.48.camel@e102346-lin.cambridge.arm.com> <4D8CC0A9.5080504@codesourcery.com> <4DA823F1.2040907@codesourcery.com> <4DBFC5D8.1090009@codesourcery.com> <4DDBE035.8050901@codesourcery.com> <4DDCF7A7.4000200@codesourcery.com> <4DDD065C.4020502@codesourcery.com> In-Reply-To: On 25/05/11 14:47, Joseph S. Myers wrote: > The shift must be by a positive constant amount, strictly less than the > precision (GET_MODE_PRECISION) of the mode (of the value being shifted). > If that applies, the relevant number of bits is the precision of the mode > minus the number of bits of the shift. For an extension, just take the > number of bits in the inner mode. Add the two numbers of bits; if the > result does not exceed the number of bits in the mode (of the operands and > the multiplication) then the multiplication won't overflow. I believe the attached should implement what you describe. Is the patch OK now? Andrew 2011-05-26 Bernd Schmidt Andrew Stubbs gcc/ * simplify-rtx.c (simplify_unary_operation_1): Canonicalize widening multiplies. * doc/md.texi (Canonicalization of Instructions): Document widening multiply canonicalization. gcc/testsuite/ * gcc.target/arm/mla-2.c: New test. --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5840,6 +5840,23 @@ Equality comparisons of a group of bits (usually a single bit) with zero will be written using @code{zero_extract} rather than the equivalent @code{and} or @code{sign_extract} operations. +@cindex @code{mult}, canonicalization of +@item +@code{(sign_extend:@var{m1} (mult:@var{m2} (sign_extend:@var{m2} @var{x}) +(sign_extend:@var{m2} @var{y})))} is converted to @code{(mult:@var{m1} +(sign_extend:@var{m1} @var{x}) (sign_extend:@var{m1} @var{y}))}, and likewise +for @code{zero_extend}. + +@item +@code{(sign_extend:@var{m1} (mult:@var{m2} (ashiftrt:@var{m2} +@var{x} @var{s}) (sign_extend:@var{m2} @var{y})))} is converted +to @code{(mult:@var{m1} (sign_extend:@var{m1} (ashiftrt:@var{m2} +@var{x} @var{s})) (sign_extend:@var{m1} @var{y}))}, and likewise for +patterns using @code{zero_extend} and @code{lshiftrt}. If the second +operand of @code{mult} is also a shift, then that is extended also. +This transformation is only applied when it can be proven that the +original operation had sufficient precision to prevent overflow. + @end itemize Further canonicalization rules are defined in the function --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -1000,6 +1000,48 @@ simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op) && GET_CODE (XEXP (XEXP (op, 0), 1)) == LABEL_REF) return XEXP (op, 0); + /* Extending a widening multiplication should be canonicalized to + a wider widening multiplication. */ + if (GET_CODE (op) == MULT) + { + rtx lhs = XEXP (op, 0); + rtx rhs = XEXP (op, 1); + enum rtx_code lcode = GET_CODE (lhs); + enum rtx_code rcode = GET_CODE (rhs); + + /* Widening multiplies usually extend both operands, but sometimes + they use a shift to extract a portion of a register. */ + if ((lcode == SIGN_EXTEND + || (lcode == ASHIFTRT && CONST_INT_P (XEXP (lhs, 1)))) + && (rcode == SIGN_EXTEND + || (rcode == ASHIFTRT && CONST_INT_P (XEXP (rhs, 1))))) + { + enum machine_mode lmode = GET_MODE (lhs); + enum machine_mode rmode = GET_MODE (rhs); + int bits; + + if (lcode == ASHIFTRT) + /* Number of bits not shifted off the end. */ + bits = GET_MODE_PRECISION (lmode) - INTVAL (XEXP (lhs, 1)); + else /* lcode == SIGN_EXTEND */ + /* Size of inner mode. */ + bits = GET_MODE_PRECISION (GET_MODE (XEXP (lhs, 0))); + + if (rcode == ASHIFTRT) + bits += GET_MODE_PRECISION (rmode) - INTVAL (XEXP (rhs, 1)); + else /* rcode == SIGN_EXTEND */ + bits += GET_MODE_PRECISION (GET_MODE (XEXP (rhs, 0))); + + /* We can only widen multiplies if the result is mathematiclly + equivalent. I.e. if overflow was impossible. */ + if (bits <= GET_MODE_PRECISION (GET_MODE (op))) + return simplify_gen_binary + (MULT, mode, + simplify_gen_unary (SIGN_EXTEND, mode, lhs, lmode), + simplify_gen_unary (SIGN_EXTEND, mode, rhs, rmode)); + } + } + /* Check for a sign extension of a subreg of a promoted variable, where the promotion is sign-extended, and the target mode is the same as the variable's promotion. */ @@ -1071,6 +1113,48 @@ simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op) && GET_MODE_SIZE (mode) <= GET_MODE_SIZE (GET_MODE (XEXP (op, 0)))) return rtl_hooks.gen_lowpart_no_emit (mode, op); + /* Extending a widening multiplication should be canonicalized to + a wider widening multiplication. */ + if (GET_CODE (op) == MULT) + { + rtx lhs = XEXP (op, 0); + rtx rhs = XEXP (op, 1); + enum rtx_code lcode = GET_CODE (lhs); + enum rtx_code rcode = GET_CODE (rhs); + + /* Widening multiplies usually extend both operands, but sometimes + they use a shift to extract a portion of a register. */ + if ((lcode == ZERO_EXTEND + || (lcode == LSHIFTRT && CONST_INT_P (XEXP (lhs, 1)))) + && (rcode == ZERO_EXTEND + || (rcode == LSHIFTRT && CONST_INT_P (XEXP (rhs, 1))))) + { + enum machine_mode lmode = GET_MODE (lhs); + enum machine_mode rmode = GET_MODE (rhs); + int bits; + + if (lcode == LSHIFTRT) + /* Number of bits not shifted off the end. */ + bits = GET_MODE_PRECISION (lmode) - INTVAL (XEXP (lhs, 1)); + else /* lcode == ZERO_EXTEND */ + /* Size of inner mode. */ + bits = GET_MODE_PRECISION (GET_MODE (XEXP (lhs, 0))); + + if (rcode == LSHIFTRT) + bits += GET_MODE_PRECISION (rmode) - INTVAL (XEXP (rhs, 1)); + else /* rcode == ZERO_EXTEND */ + bits += GET_MODE_PRECISION (GET_MODE (XEXP (rhs, 0))); + + /* We can only widen multiplies if the result is mathematiclly + equivalent. I.e. if overflow was impossible. */ + if (bits <= GET_MODE_PRECISION (GET_MODE (op))) + return simplify_gen_binary + (MULT, mode, + simplify_gen_unary (ZERO_EXTEND, mode, lhs, lmode), + simplify_gen_unary (ZERO_EXTEND, mode, rhs, rmode)); + } + } + /* (zero_extend:M (zero_extend:N )) is (zero_extend:M ). */ if (GET_CODE (op) == ZERO_EXTEND) return simplify_gen_unary (ZERO_EXTEND, mode, XEXP (op, 0), --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mla-2.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv7-a" } */ + +long long foolong (long long x, short *a, short *b) +{ + return x + *a * *b; +} + +/* { dg-final { scan-assembler "smlalbb" } } */