From patchwork Mon Jul  4 14:23:13 2011
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Stubbs <ams@codesourcery.com>
X-Patchwork-Id: 2437
Return-Path: <patch+caf_=linaro-patchwork=canonical.com@linaro.org>
X-Original-To: patchwork@peony.canonical.com
Delivered-To: patchwork@peony.canonical.com
Received: from fiordland.canonical.com (fiordland.canonical.com
 [91.189.94.145])
 by peony.canonical.com (Postfix) with ESMTP id 23D2B23F53
 for <patchwork@peony.canonical.com>;
 Mon,  4 Jul 2011 14:23:25 +0000 (UTC)
Received: from mail-qw0-f52.google.com (mail-qw0-f52.google.com
 [209.85.216.52])
 by fiordland.canonical.com (Postfix) with ESMTP id CC0EFA18231
 for <linaro-patchwork@canonical.com>;
 Mon,  4 Jul 2011 14:23:24 +0000 (UTC)
Received: by qwb8 with SMTP id 8so3563984qwb.11
 for <linaro-patchwork@canonical.com>;
 Mon, 04 Jul 2011 07:23:24 -0700 (PDT)
Received: by 10.229.1.140 with SMTP id 12mr4713809qcf.118.1309789404220;
 Mon, 04 Jul 2011 07:23:24 -0700 (PDT)
X-Forwarded-To: linaro-patchwork@canonical.com
X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com
Delivered-To: patches@linaro.org
Received: by 10.229.48.135 with SMTP id r7cs29354qcf;
 Mon, 4 Jul 2011 07:23:23 -0700 (PDT)
Received: by 10.42.74.135 with SMTP id w7mr3494389icj.506.1309789402053;
 Mon, 04 Jul 2011 07:23:22 -0700 (PDT)
Received: from mail.codesourcery.com (mail.codesourcery.com [38.113.113.100])
 by mx.google.com with ESMTPS id
 f10si18998679icx.78.2011.07.04.07.23.20
 (version=TLSv1/SSLv3 cipher=OTHER);
 Mon, 04 Jul 2011 07:23:20 -0700 (PDT)
Received-SPF: pass (google.com: domain of ams@codesourcery.com designates
 38.113.113.100 as permitted sender) client-ip=38.113.113.100; 
Authentication-Results: mx.google.com;
 spf=pass (google.com: domain of ams@codesourcery.com
 designates 38.113.113.100 as permitted sender)
 smtp.mail=ams@codesourcery.com
Received: (qmail 26787 invoked from network); 4 Jul 2011 14:23:17 -0000
Received: from unknown (HELO ?192.168.0.100?) (ams@127.0.0.2)
 by mail.codesourcery.com with ESMTPA; 4 Jul 2011 14:23:17 -0000
Message-ID: <4E11CCD1.4010505@codesourcery.com>
Date: Mon, 04 Jul 2011 15:23:13 +0100
From: Andrew Stubbs <ams@codesourcery.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
 rv:1.9.2.17) Gecko/20110516 Lightning/1.0b2 Thunderbird/3.1.10
MIME-Version: 1.0
To: Richard Guenther <richard.guenther@gmail.com>
CC: Michael Matz <matz@suse.de>, gcc-patches@gcc.gnu.org, patches@linaro.org
Subject: Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
References: <4E034EF2.3070503@codesourcery.com>	<4E03504B.9060305@codesourcery.com>	<BANLkTi=a8B-DZZdG2bmOWQ8A+-pCSEsAbQ@mail.gmail.com>	<4E044559.5000105@linaro.org>	<BANLkTimfKyrARS+eRy+MWYVO2gLqp85JsQ@mail.gmail.com>	<1A77B5B39081C241A68E6CF16983025F020906F6@EU1-MAIL.mgc.mentorg.com>	<BANLkTi=1TRM7uHWxLD3Se=S-ibe0C13T-Q@mail.gmail.com>	<4E09B142.4020402@codesourcery.com>	<BANLkTim0wLcSTgAL9isO+dVXxgq-U8B4Sw@mail.gmail.com>	<Pine.LNX.4.64.1106281741170.17115@wotan.suse.de>	<4E09FDEA.3000004@gmail.com>	<Pine.LNX.4.64.1106281827230.17115@wotan.suse.de>	<1A77B5B39081C241A68E6CF16983025F0209071D@EU1-MAIL.mgc.mentorg.com>
 <BANLkTimJob8C2L8kYkd-aU6FbM5nckc8Yg@mail.gmail.com>
In-Reply-To: <BANLkTimJob8C2L8kYkd-aU6FbM5nckc8Yg@mail.gmail.com>

On 01/07/11 13:25, Richard Guenther wrote:
> Well - some operations work the same on both signedness if you
> just care about the twos-complement result.  This includes
> multiplication (but not for example division).  For this special
> case I suggest to not bother trying to invent a generic predicate
> but do something local in tree-ssa-math-opts.c.

OK, here's my updated patch.

I've taken the view that we *know* what size and signedness the result 
of the multiplication is, and we know what size the input to the 
addition must be, so all the check has to do is make sure it does that 
same conversion, even if by a roundabout means.

What I hadn't grasped before is that when extending a value it's the 
source type that is significant, not the destination, so the checks are 
not as complex as I had thought.

So, this patch adds a test to ensure that:

  1. the type is not truncated so far that we lose any information; and

  2. the type is only ever extended in the proper signedness.

Also, just to be absolutely sure, I've also added a little bit of logic 
to permit extends that are then undone by a truncate. I'm really not 
sure what guarantees there are about what sort of cast sequences can 
exist? Is this necessary? I haven't managed to coax it to generated any 
examples of extends followed by truncates myself, but in any case, it's 
hardly any code and it'll make sure it's future proofed.

OK?

Andrew

2011-06-28  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (valid_types_for_madd_p): New function.
	(convert_plusminus_to_widen): Use valid_types_for_madd_p to
	identify optimization candidates.

	gcc/testsuite/
	* gcc.target/arm/wmul-5.c: New file.
	* gcc.target/arm/no-wmla-1.c: New file.

---
 .../gcc/testsuite/gcc.target/arm/no-wmla-1.c       |   11 ++
 .../gcc/testsuite/gcc.target/arm/wmul-5.c          |   10 ++
 src/gcc-mainline/gcc/tree-ssa-math-opts.c          |  112 ++++++++++++++++++--
 3 files changed, 123 insertions(+), 10 deletions(-)
 create mode 100644 src/gcc-mainline/gcc/testsuite/gcc.target/arm/no-wmla-1.c
 create mode 100644 src/gcc-mainline/gcc/testsuite/gcc.target/arm/wmul-5.c

diff --git a/src/gcc-mainline/gcc/testsuite/gcc.target/arm/no-wmla-1.c b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/no-wmla-1.c
new file mode 100644
index 0000000..17f7427
--- /dev/null
+++ b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/no-wmla-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+int
+foo (int a, short b, short c)
+{
+     int bc = b * c;
+        return a + (short)bc;
+}
+
+/* { dg-final { scan-assembler "mul" } } */
diff --git a/src/gcc-mainline/gcc/testsuite/gcc.target/arm/wmul-5.c b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/wmul-5.c
new file mode 100644
index 0000000..65c43e3
--- /dev/null
+++ b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/wmul-5.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, char *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
diff --git a/src/gcc-mainline/gcc/tree-ssa-math-opts.c b/src/gcc-mainline/gcc/tree-ssa-math-opts.c
index d55ba57..5ef7bb4 100644
--- a/src/gcc-mainline/gcc/tree-ssa-math-opts.c
+++ b/src/gcc-mainline/gcc/tree-ssa-math-opts.c
@@ -2085,6 +2085,78 @@ convert_mult_to_widen (gimple stmt)
   return true;
 }
 
+/* Check the input types, TYPE1 and TYPE2 to a widening multiply,
+   and then the convertions between the output of the multiply, and
+   the input to an addition EXPR, to ensure that they are compatible with
+   a widening multiply-and-accumulate.
+
+   This function assumes that expr is a valid string of conversion expressions
+   terminated by a multiplication.
+
+   This function tries NOT to make any (fragile) assumptions about what
+   sequence of conversions can exist in the input.  */
+
+static bool
+valid_types_for_madd_p (tree type1, tree type2, tree expr)
+{
+  gimple stmt, prev_stmt;
+  enum tree_code code, prev_code;
+  tree prev_expr, type, prev_type;
+  int bitsize, prev_bitsize, initial_bitsize, min_bitsize;
+  bool initial_unsigned;
+
+  initial_bitsize = TYPE_PRECISION (type1) + TYPE_PRECISION (type2);
+  initial_unsigned = TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2);
+
+  stmt = SSA_NAME_DEF_STMT (expr);
+  code = gimple_assign_rhs_code (stmt);
+  type = TREE_TYPE (expr);
+  bitsize = TYPE_PRECISION (type);
+  min_bitsize = bitsize;
+
+  if (code == MULT_EXPR || code == WIDEN_MULT_EXPR)
+    return true;
+
+  if (!INTEGRAL_TYPE_P (type)
+      || TYPE_PRECISION (type) < initial_bitsize)
+    return false;
+
+  /* Step through the conversions backwards.  */
+  while (true)
+    {
+      prev_expr = gimple_assign_rhs1 (stmt);
+      prev_stmt = SSA_NAME_DEF_STMT (prev_expr);
+      prev_code = gimple_assign_rhs_code (prev_stmt);
+      prev_type = TREE_TYPE (prev_expr);
+      prev_bitsize = TYPE_PRECISION (prev_type);
+
+      if (prev_code == MULT_EXPR || prev_code == WIDEN_MULT_EXPR)
+	break;
+
+      /* If it's an unsuitable type or a truncate that damages the
+	 original value, then were done.  */
+      if (!INTEGRAL_TYPE_P (prev_type)
+	  || TYPE_PRECISION (prev_type) < initial_bitsize)
+	return false;
+
+      /* If we have the wrong sort of extend for the value, then it
+	 could still be ok if we already saw a truncate that reverses
+	 the effect.  */
+      if (bitsize > prev_bitsize
+	  && TYPE_UNSIGNED (prev_type) != initial_unsigned
+	  && min_bitsize > prev_bitsize)
+	return false;
+
+      stmt = prev_stmt;
+      code = prev_code;
+      type = prev_type;
+      bitsize = prev_bitsize;
+      min_bitsize = bitsize < min_bitsize ? bitsize : min_bitsize;
+    }
+
+  return true;
+}
+
 /* Process a single gimple statement STMT, which is found at the
    iterator GSI and has a either a PLUS_EXPR or a MINUS_EXPR as its
    rhs (given by CODE), and try to convert it into a
@@ -2098,6 +2170,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   gimple rhs1_stmt = NULL, rhs2_stmt = NULL;
   tree type, type1, type2;
   tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs;
+  tree tmp, mult_rhs;
   enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK;
   optab this_optab;
   enum tree_code wmult_code;
@@ -2117,22 +2190,32 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   rhs1 = gimple_assign_rhs1 (stmt);
   rhs2 = gimple_assign_rhs2 (stmt);
 
-  if (TREE_CODE (rhs1) == SSA_NAME)
+  for (tmp = rhs1, rhs1_code = ERROR_MARK;
+       TREE_CODE (tmp) == SSA_NAME
+       && (CONVERT_EXPR_CODE_P (rhs1_code) || rhs1_code == ERROR_MARK);
+       tmp = gimple_assign_rhs1 (rhs1_stmt))
     {
-      rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
-      if (is_gimple_assign (rhs1_stmt))
-	rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+      rhs1_stmt = SSA_NAME_DEF_STMT (tmp);
+      if (!is_gimple_assign (rhs1_stmt))
+	break;
+      rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
     }
-  else
+
+  if (TREE_CODE (tmp) != SSA_NAME)
     return false;
 
-  if (TREE_CODE (rhs2) == SSA_NAME)
+  for (tmp = rhs2, rhs2_code = ERROR_MARK;
+       TREE_CODE (tmp) == SSA_NAME
+       && (CONVERT_EXPR_CODE_P (rhs2_code) || rhs2_code == ERROR_MARK);
+       tmp = gimple_assign_rhs1 (rhs2_stmt))
     {
-      rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
-      if (is_gimple_assign (rhs2_stmt))
-	rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
+      rhs2_stmt = SSA_NAME_DEF_STMT (tmp);
+      if (!is_gimple_assign (rhs2_stmt))
+	break;
+      rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
     }
-  else
+
+  if (TREE_CODE (tmp) != SSA_NAME)
     return false;
 
   if (code == PLUS_EXPR && rhs1_code == MULT_EXPR)
@@ -2140,6 +2223,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       if (!is_widening_mult_p (rhs1_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
+      mult_rhs = rhs1;
       add_rhs = rhs2;
     }
   else if (rhs2_code == MULT_EXPR)
@@ -2147,6 +2231,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       if (!is_widening_mult_p (rhs2_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
+      mult_rhs = rhs2;
       add_rhs = rhs1;
     }
   else if (code == PLUS_EXPR && rhs1_code == WIDEN_MULT_EXPR)
@@ -2155,6 +2240,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       mult_rhs2 = gimple_assign_rhs2 (rhs1_stmt);
       type1 = TREE_TYPE (mult_rhs1);
       type2 = TREE_TYPE (mult_rhs2);
+      mult_rhs = rhs1;
       add_rhs = rhs2;
     }
   else if (rhs2_code == WIDEN_MULT_EXPR)
@@ -2163,6 +2249,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       mult_rhs2 = gimple_assign_rhs2 (rhs2_stmt);
       type1 = TREE_TYPE (mult_rhs1);
       type2 = TREE_TYPE (mult_rhs2);
+      mult_rhs = rhs2;
       add_rhs = rhs1;
     }
   else
@@ -2171,6 +2258,11 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
     return false;
 
+  /* Verify that the convertions between the mult and the add doesn't do
+     anything unexpected.  */
+  if (!valid_types_for_madd_p (type1, type2, mult_rhs))
+    return false;
+
   /* Verify that the machine can perform a widening multiply
      accumulate in this mode/signedness combination, otherwise
      this transformation is likely to pessimize code.  */