From patchwork Thu May 26 13:35:38 2011
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Stubbs <ams@codesourcery.com>
X-Patchwork-Id: 1645
Return-Path: <ams@codesourcery.com>
Delivered-To: unknown
Received: from imap.gmail.com (74.125.159.109) by localhost6.localdomain6
 with IMAP4-SSL; 08 Jun 2011 14:53:46 -0000
Delivered-To: patches@linaro.org
Received: by 10.52.181.230 with SMTP id dz6cs23432vdc;
 Thu, 26 May 2011 06:35:48 -0700 (PDT)
Received: by 10.43.62.134 with SMTP id xa6mr1493902icb.369.1306416947605;
 Thu, 26 May 2011 06:35:47 -0700 (PDT)
Received: from mail.codesourcery.com (mail.codesourcery.com [38.113.113.100])
 by mx.google.com with ESMTPS id r10si7354765icw.8.2011.05.26.06.35.47
 (version=TLSv1/SSLv3 cipher=OTHER);
 Thu, 26 May 2011 06:35:47 -0700 (PDT)
Received-SPF: pass (google.com: domain of ams@codesourcery.com designates
 38.113.113.100 as permitted sender) client-ip=38.113.113.100; 
Authentication-Results: mx.google.com;
 spf=pass (google.com: domain of ams@codesourcery.com
 designates 38.113.113.100 as permitted sender)
 smtp.mail=ams@codesourcery.com
Received: (qmail 21155 invoked from network); 26 May 2011 13:35:45 -0000
Received: from unknown (HELO ?192.168.0.100?) (ams@127.0.0.2)
 by mail.codesourcery.com with ESMTPA; 26 May 2011 13:35:45 -0000
Message-ID: <4DDE572A.6060305@codesourcery.com>
Date: Thu, 26 May 2011 14:35:38 +0100
From: Andrew Stubbs <ams@codesourcery.com>
Organization: CodeSourcery
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
 rv:1.9.2.17) Gecko/20110424 Lightning/1.0b2 Thunderbird/3.1.10
MIME-Version: 1.0
To: "Joseph S. Myers" <joseph@codesourcery.com>
CC: Bernd Schmidt <bernds@codesourcery.com>, 
 Richard Earnshaw <rearnsha@arm.com>,
 gcc-patches@gcc.gnu.org, patches@linaro.org
Subject: Re: [patch][simplify-rtx] Fix 16-bit -> 64-bit multiply and accumulate
References: <4D01018F.3020108@codesourcery.com>
 <1296153619.9738.16.camel@e102346-lin.cambridge.arm.com>
 <4D42955C.1060707@codesourcery.com>
 <1296223929.9738.30.camel@e102346-lin.cambridge.arm.com>
 <4D42DD32.7020404@codesourcery.com>
 <1296228038.9738.48.camel@e102346-lin.cambridge.arm.com>
 <4D8CC0A9.5080504@codesourcery.com>
 <4DA823F1.2040907@codesourcery.com>
 <4DBFC5D8.1090009@codesourcery.com>
 <4DDBE035.8050901@codesourcery.com>
 <Pine.LNX.4.64.1105241928360.21283@digraph.polyomino.org.uk>
 <4DDCF7A7.4000200@codesourcery.com>
 <Pine.LNX.4.64.1105251305270.696@digraph.polyomino.org.uk>
 <4DDD065C.4020502@codesourcery.com>
 <Pine.LNX.4.64.1105251341250.696@digraph.polyomino.org.uk>
In-Reply-To: <Pine.LNX.4.64.1105251341250.696@digraph.polyomino.org.uk>

On 25/05/11 14:47, Joseph S. Myers wrote:
> The shift must be by a positive constant amount, strictly less than the
> precision (GET_MODE_PRECISION) of the mode (of the value being shifted).
> If that applies, the relevant number of bits is the precision of the mode
> minus the number of bits of the shift.  For an extension, just take the
> number of bits in the inner mode.  Add the two numbers of bits; if the
> result does not exceed the number of bits in the mode (of the operands and
> the multiplication) then the multiplication won't overflow.

I believe the attached should implement what you describe.

Is the patch OK now?

Andrew

2011-05-26  Bernd Schmidt  <bernds@codesourcery.com>
	    Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* simplify-rtx.c (simplify_unary_operation_1): Canonicalize widening
	multiplies.
	* doc/md.texi (Canonicalization of Instructions): Document widening
	multiply canonicalization.

	gcc/testsuite/
	* gcc.target/arm/mla-2.c: New test.

--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5840,6 +5840,23 @@ Equality comparisons of a group of bits (usually a single bit) with zero
 will be written using @code{zero_extract} rather than the equivalent
 @code{and} or @code{sign_extract} operations.
 
+@cindex @code{mult}, canonicalization of
+@item
+@code{(sign_extend:@var{m1} (mult:@var{m2} (sign_extend:@var{m2} @var{x})
+(sign_extend:@var{m2} @var{y})))} is converted to @code{(mult:@var{m1}
+(sign_extend:@var{m1} @var{x}) (sign_extend:@var{m1} @var{y}))}, and likewise
+for @code{zero_extend}.
+
+@item
+@code{(sign_extend:@var{m1} (mult:@var{m2} (ashiftrt:@var{m2}
+@var{x} @var{s}) (sign_extend:@var{m2} @var{y})))} is converted
+to @code{(mult:@var{m1} (sign_extend:@var{m1} (ashiftrt:@var{m2}
+@var{x} @var{s})) (sign_extend:@var{m1} @var{y}))}, and likewise for
+patterns using @code{zero_extend} and @code{lshiftrt}.  If the second
+operand of @code{mult} is also a shift, then that is extended also.
+This transformation is only applied when it can be proven that the
+original operation had sufficient precision to prevent overflow.
+
 @end itemize
 
 Further canonicalization rules are defined in the function
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -1000,6 +1000,48 @@ simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op)
 	  && GET_CODE (XEXP (XEXP (op, 0), 1)) == LABEL_REF)
 	return XEXP (op, 0);
 
+      /* Extending a widening multiplication should be canonicalized to
+	 a wider widening multiplication.  */
+      if (GET_CODE (op) == MULT)
+	{
+	  rtx lhs = XEXP (op, 0);
+	  rtx rhs = XEXP (op, 1);
+	  enum rtx_code lcode = GET_CODE (lhs);
+	  enum rtx_code rcode = GET_CODE (rhs);
+
+	  /* Widening multiplies usually extend both operands, but sometimes
+	     they use a shift to extract a portion of a register.  */
+	  if ((lcode == SIGN_EXTEND
+	       || (lcode == ASHIFTRT && CONST_INT_P (XEXP (lhs, 1))))
+	      && (rcode == SIGN_EXTEND
+		  || (rcode == ASHIFTRT && CONST_INT_P (XEXP (rhs, 1)))))
+	    {
+	      enum machine_mode lmode = GET_MODE (lhs);
+	      enum machine_mode rmode = GET_MODE (rhs);
+	      int bits;
+
+	      if (lcode == ASHIFTRT)
+		/* Number of bits not shifted off the end.  */
+		bits = GET_MODE_PRECISION (lmode) - INTVAL (XEXP (lhs, 1));
+	      else /* lcode == SIGN_EXTEND */
+		/* Size of inner mode.  */
+		bits = GET_MODE_PRECISION (GET_MODE (XEXP (lhs, 0)));
+
+	      if (rcode == ASHIFTRT)
+		bits += GET_MODE_PRECISION (rmode) - INTVAL (XEXP (rhs, 1));
+	      else /* rcode == SIGN_EXTEND */
+		bits += GET_MODE_PRECISION (GET_MODE (XEXP (rhs, 0)));
+
+	      /* We can only widen multiplies if the result is mathematiclly
+		 equivalent.  I.e. if overflow was impossible.  */
+	      if (bits <= GET_MODE_PRECISION (GET_MODE (op)))
+		return simplify_gen_binary
+			 (MULT, mode,
+			  simplify_gen_unary (SIGN_EXTEND, mode, lhs, lmode),
+			  simplify_gen_unary (SIGN_EXTEND, mode, rhs, rmode));
+	    }
+	}
+
       /* Check for a sign extension of a subreg of a promoted
 	 variable, where the promotion is sign-extended, and the
 	 target mode is the same as the variable's promotion.  */
@@ -1071,6 +1113,48 @@ simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op)
 	  && GET_MODE_SIZE (mode) <= GET_MODE_SIZE (GET_MODE (XEXP (op, 0))))
 	return rtl_hooks.gen_lowpart_no_emit (mode, op);
 
+      /* Extending a widening multiplication should be canonicalized to
+	 a wider widening multiplication.  */
+      if (GET_CODE (op) == MULT)
+	{
+	  rtx lhs = XEXP (op, 0);
+	  rtx rhs = XEXP (op, 1);
+	  enum rtx_code lcode = GET_CODE (lhs);
+	  enum rtx_code rcode = GET_CODE (rhs);
+
+	  /* Widening multiplies usually extend both operands, but sometimes
+	     they use a shift to extract a portion of a register.  */
+	  if ((lcode == ZERO_EXTEND
+	       || (lcode == LSHIFTRT && CONST_INT_P (XEXP (lhs, 1))))
+	      && (rcode == ZERO_EXTEND
+		  || (rcode == LSHIFTRT && CONST_INT_P (XEXP (rhs, 1)))))
+	    {
+	      enum machine_mode lmode = GET_MODE (lhs);
+	      enum machine_mode rmode = GET_MODE (rhs);
+	      int bits;
+
+	      if (lcode == LSHIFTRT)
+		/* Number of bits not shifted off the end.  */
+		bits = GET_MODE_PRECISION (lmode) - INTVAL (XEXP (lhs, 1));
+	      else /* lcode == ZERO_EXTEND */
+		/* Size of inner mode.  */
+		bits = GET_MODE_PRECISION (GET_MODE (XEXP (lhs, 0)));
+
+	      if (rcode == LSHIFTRT)
+		bits += GET_MODE_PRECISION (rmode) - INTVAL (XEXP (rhs, 1));
+	      else /* rcode == ZERO_EXTEND */
+		bits += GET_MODE_PRECISION (GET_MODE (XEXP (rhs, 0)));
+
+	      /* We can only widen multiplies if the result is mathematiclly
+		 equivalent.  I.e. if overflow was impossible.  */
+	      if (bits <= GET_MODE_PRECISION (GET_MODE (op)))
+		return simplify_gen_binary
+			 (MULT, mode,
+			  simplify_gen_unary (ZERO_EXTEND, mode, lhs, lmode),
+			  simplify_gen_unary (ZERO_EXTEND, mode, rhs, rmode));
+	    }
+	}
+
       /* (zero_extend:M (zero_extend:N <X>)) is (zero_extend:M <X>).  */
       if (GET_CODE (op) == ZERO_EXTEND)
 	return simplify_gen_unary (ZERO_EXTEND, mode, XEXP (op, 0),
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mla-2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long foolong (long long x, short *a, short *b)
+{
+    return x + *a * *b;
+}
+
+/* { dg-final { scan-assembler "smlalbb" } } */