From patchwork Sat Sep 23 08:24:48 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Sandiford <richard.sandiford@linaro.org>
X-Patchwork-Id: 114122
Delivered-To: patch@linaro.org
Received: by 10.140.106.117 with SMTP id d108csp415916qgf;
 Sat, 23 Sep 2017 01:25:14 -0700 (PDT)
X-Received: by 10.84.151.68 with SMTP id i62mr1515546pli.179.1506155114436; 
 Sat, 23 Sep 2017 01:25:14 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1506155114; cv=none;
 d=google.com; s=arc-20160816;
 b=XVkFOlV1ofcRqTACiDNRqO1PA/jleh/jHGFu1cD5aRLZpOxpSHuZf0Y2tdBiIYfcVe
 uMYwpj8y8TTYDwHJe4wE8nGwkvQaKei0lAJc4mYi+cneuoNhG4H4yb/a56M5CsHO5PFq
 al5sLbV3nPLQIduI0UY+F/hM8/0+oKOZg6sP6Uvaj9HWf508zzLPAXZc5x5TTYQ9SALu
 vlIIZ++K0ECDzaWJ7qR8WNcTJ+48uagWSlW7q6e/OD8UlmVuC/JQaRmVMmN90gNZfnEK
 Pi2sTza6OTnlhtFqXMkIJnXjIqGmGdPwNw2ZBlYBaIPVh+uh3C5NszEXx2VVRvsfqpNx
 4Wyg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=mime-version:user-agent:message-id:date:subject:cc:mail-followup-to
 :to:from:delivered-to:sender:list-help:list-post:list-archive
 :list-unsubscribe:list-id:precedence:mailing-list:dkim-signature
 :domainkey-signature:arc-authentication-results;
 bh=NDr3xnk1uEwOBPD4S4wVsXdPIOBXzyuPRBDzgtDtOPc=;
 b=BXpjuTwZqVvijgppPJJE1k3nezUFoABBvZtECR94WlnZJTCs8TsLRu8agZpLzZOMi4
 YVR7BpCE3rsbRiR9hjCggLL5ZTSwNw2P9GQQJLl4XW5Zwaxg6WzR5P27+/h2oaNhuJ4a
 bQNGLsdM504BgFuNMcJv7Wv8WygAoAI2lmhY2i45xn3jF5BzPL5U4jluQNjdwsCYf9+a
 yWFO/I1Vkwq0DnpC78FTD72Eb5xAWdKG2qvmGp1YztbUSzP695vFJrUie5LP9dYaAYIM
 Y/0QSmuLiFxQPo/RpSMZ+Dmup7SFigcAqcHL52VQAQ475eO0Pzx3UmSWdIeZ9KbT89/Z
 rV0w==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@gcc.gnu.org header.s=default header.b=KfCsHVEl;
 spf=pass (google.com: domain of
 gcc-patches-return-462820-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 smtp.mailfrom=gcc-patches-return-462820-patch=linaro.org@gcc.gnu.org;
 dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <gcc-patches-return-462820-patch=linaro.org@gcc.gnu.org>
Received: from sourceware.org (server1.sourceware.org. [209.132.180.131])
 by mx.google.com with ESMTPS id
 c13si1037686pgn.262.2017.09.23.01.25.14 for <patch@linaro.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Sat, 23 Sep 2017 01:25:14 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-return-462820-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 client-ip=209.132.180.131; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@gcc.gnu.org header.s=default header.b=KfCsHVEl;
 spf=pass (google.com: domain of
 gcc-patches-return-462820-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 smtp.mailfrom=gcc-patches-return-462820-patch=linaro.org@gcc.gnu.org;
 dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender:from
 :to:cc:subject:date:message-id:mime-version:content-type; q=dns;
 s=default; b=CJ9m/tv+Dq+4GbmYCmIejFj8DaZOmY+50Fy9fnjbHERERXmD/l
 fww04jZrawnRDg7hF5fjNaXAjh3L1k7AwxcT3HPNdIXY76AzYRWRYH6L6hxf2IGD
 QOiO3bs0x7sdRm+4vC4ioQ3i9jvlPkrM+0BMe68KDkpREKM24Hf+bvSwA=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender:from
 :to:cc:subject:date:message-id:mime-version:content-type; s=
 default; bh=ZQGF1lLOWq0W2iNyo713APFrSZw=; b=KfCsHVElPyxCPPC5dnZ1
 4PeV3mZN9wOmO19jaFh7NgrbEKvRh7eVtJm7RHPHd8RMsyizDmm7GQHYqLNOSpI8
 WfyTu2UROw1NH/00KJG9EtB3kuScNSq6Keg7dyQy6pE46WUozQtPcUye+QsMSaVJ
 TjX7yPR10Io4HgsabLJsc5g=
Received: (qmail 87446 invoked by alias); 23 Sep 2017 08:24:58 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <mailto:gcc-patches-unsubscribe-patch=linaro.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 87436 invoked by uid 89); 23 Sep 2017 08:24:57 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-15.1 required=5.0 tests=AWL, BAYES_00,
 GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS,
 RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM,
 SPF_PASS autolearn=ham version=3.3.2 spammy=lo12
X-HELO: mail-wr0-f171.google.com
Received: from mail-wr0-f171.google.com (HELO mail-wr0-f171.google.com)
 (209.85.128.171) by sourceware.org
 (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
 Sat, 23 Sep 2017 08:24:54 +0000
Received: by mail-wr0-f171.google.com with SMTP id k20so2288494wre.4 for
 <gcc-patches@gcc.gnu.org>; Sat, 23 Sep 2017 01:24:53 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net;
 s=20161025;
 h=x-gm-message-state:from:to:mail-followup-to:cc:subject:date
 :message-id:user-agent:mime-version;
 bh=NDr3xnk1uEwOBPD4S4wVsXdPIOBXzyuPRBDzgtDtOPc=;
 b=NBIheAxpjhUHhgAkuJ9ANLax7mYK5TMQZnJVQNeqr0oScpvu5O/ESxYiPUCfncWZkK
 TI5ubnSPvWyfoqiieot7tq6E7RM+oYqqa4xWVqtz+A8qyFvrI0kb7FeLR3fqFAir083e
 ZBI2v22mmCzpbAEtFU/9yzBvDQ5V5cICN7r6laMiU+u2UPLeHpKePP2wHS7ZE9Ic31Lt
 cUvpUkfyioMfyrZzpep25RJ/+epqoROUcXhVn3G9Q8TQXFDZkpz/f08G1F3B6Pd1qz/e
 12E1VSBTbcX2zOrdqnDwGbtVhRFENLp9u5toMfiMaXdR88m0rg38QSZHW4Uo3kvvexlP
 Xe/w==
X-Gm-Message-State: AHPjjUh/cAHIHf2TR9TgIzyAaANLazCCMQu7RJ9VysaEqfp3DVngY8Cd	WlJATcHzV8RFjQnyG3DSu3ketg==
X-Google-Smtp-Source: AOwi7QBwrYHLP4n1paXB4Tf0maUdGLiHZzcpvYCWM7EgXafF91uME/peUGJbsCphmOs3fKhQ0FaCgQ==
X-Received: by 10.223.152.132 with SMTP id w4mr1219086wrb.264.1506155091983;
 Sat, 23 Sep 2017 01:24:51 -0700 (PDT)
Received: from localhost ([2.25.234.72]) by smtp.gmail.com with ESMTPSA id
 a19sm2058117wra.64.2017.09.23.01.24.50 (version=TLS1_2
 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
 Sat, 23 Sep 2017 01:24:51 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@linaro.org>
To: gcc-patches@gcc.gnu.org
Mail-Followup-To: gcc-patches@gcc.gnu.org, dmalcolm@redhat.com,
 richard.sandiford@linaro.org
Cc: dmalcolm@redhat.com
Subject: Add more vec_duplicate simplifications
Date: Sat, 23 Sep 2017 09:24:48 +0100
Message-ID: <87d16hrfe7.fsf@linaro.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)
MIME-Version: 1.0

This patch adds a vec_duplicate_p helper that tests for constant
or non-constant vector duplicates.  Together with the existing
const_vec_duplicate_p, this complements the gen_vec_duplicate
and gen_const_vec_duplicate added by a previous patch.

The patch uses the new routines to add more rtx simplifications
involving vector duplicates.  These mirror simplifications that
we already do for CONST_VECTOR broadcasts and are needed for
variable-length SVE, which uses:

  (const:M (vec_duplicate:M X))

to represent constant broadcasts instead.  The simplifications
trigger on the testsuite for variable duplicates too, and in each
case I saw the change was an improvement.  E.g.:

- Several targets had this simplification in gcc.dg/pr49948.c
  when compiled at -O3:

    -Failed to match this instruction:
    +Successfully matched this instruction:
     (set (reg:DI 88)
    -    (subreg:DI (vec_duplicate:V2DI (reg/f:DI 75 [ _4 ])) 0))
    +    (reg/f:DI 75 [ _4 ]))

  On aarch64 this gives:

            ret
            .p2align 2
     .L8:
    +       adrp    x1, b
            sub     sp, sp, #80
    -       adrp    x2, b
    -       add     x1, sp, 12
    +       add     x2, sp, 12
            str     wzr, [x0, #:lo12:a]
    +       str     x2, [x1, #:lo12:b]
            mov     w0, 0
    -       dup     v0.2d, x1
    -       str     d0, [x2, #:lo12:b]
            add     sp, sp, 80
            ret
            .size   foo, .-foo

  On x86_64:

            jg      .L2
            leaq    -76(%rsp), %rax
            movl    $0, a(%rip)
    -       movq    %rax, -96(%rsp)
    -       movq    -96(%rsp), %xmm0
    -       punpcklqdq      %xmm0, %xmm0
    -       movq    %xmm0, b(%rip)
    +       movq    %rax, b(%rip)
     .L2:
            xorl    %eax, %eax
            ret

  etc.

- gcc.dg/torture/pr58018.c compiled at -O3 on aarch64 has an instance of:

     Trying 50, 52, 46 -> 53:
     Failed to match this instruction:
     (set (reg:V4SI 167)
    -    (and:V4SI (and:V4SI (vec_duplicate:V4SI (reg:SI 132 [ _165 ]))
    -            (reg:V4SI 209))
    -        (const_vector:V4SI [
    -                (const_int 1 [0x1])
    -                (const_int 1 [0x1])
    -                (const_int 1 [0x1])
    -                (const_int 1 [0x1])
    -            ])))
    +    (and:V4SI (vec_duplicate:V4SI (reg:SI 132 [ _165 ]))
    +        (reg:V4SI 209)))
     Successfully matched this instruction:
     (set (reg:V4SI 163 [ vect_patt_16.14 ])
         (vec_duplicate:V4SI (reg:SI 132 [ _165 ])))
    +Successfully matched this instruction:
    +(set (reg:V4SI 167)
    +    (and:V4SI (reg:V4SI 163 [ vect_patt_16.14 ])
    +        (reg:V4SI 209)))

  where (reg:SI 132) is the result of a scalar comparison and so
  is known to be 0 or 1.  This saves a MOVI and vector AND:

            cmp     w7, 4
            bls     .L15
            dup     v1.4s, w2
    -       lsr     w2, w1, 2
    +       dup     v2.4s, w6
            movi    v3.4s, 0
    -       mov     w0, 0
    -       movi    v2.4s, 0x1
    +       lsr     w2, w1, 2
            mvni    v0.4s, 0
    +       mov     w0, 0
            cmge    v1.4s, v1.4s, v3.4s
            and     v1.16b, v2.16b, v1.16b
    -       dup     v2.4s, w6
    -       and     v1.16b, v1.16b, v2.16b
            .p2align 3
     .L7:
            and     v0.16b, v0.16b, v1.16b

- powerpc64le has many instances of things like:

    -Failed to match this instruction:
    +Successfully matched this instruction:
     (set (reg:V4SI 161 [ vect_cst__24 ])
    -    (vec_select:V4SI (vec_duplicate:V4SI (vec_select:SI (reg:V4SI 143)
    -                (parallel [
    -                        (const_int 0 [0])
    -                    ])))
    -        (parallel [
    -                (const_int 2 [0x2])
    -                (const_int 3 [0x3])
    -                (const_int 0 [0])
    -                (const_int 1 [0x1])
    -            ])))
    +    (vec_duplicate:V4SI (vec_select:SI (reg:V4SI 143)
    +            (parallel [
    +                    (const_int 0 [0])
    +                ]))))

  This removes redundant XXPERMDIs from many tests.

The best way of testing the new simplifications seemed to be
via selftests.  The patch cribs part of David's patch here:
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00270.html .

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
Also tested by comparing the testsuite assembly output on at least one
target per CPU directory, with the results above.  OK to install?

Richard


2017-09-22  Richard Sandiford  <richard.sandiford@linaro.org>
	    David Malcolm  <dmalcolm@redhat.com>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* rtl.h (vec_duplicate_p): New function.
	* selftest-rtl.c (assert_rtx_eq_at): New function.
	* selftest-rtl.h (ASSERT_RTX_EQ): New macro.
	(assert_rtx_eq_at): Declare.
	* selftest.h (selftest::simplify_rtx_c_tests): Declare.
	* selftest-run-tests.c (selftest::run_tests): Call it.
	* simplify-rtx.c: Include selftest.h and selftest-rtl.h.
	(simplify_unary_operation_1): Recursively handle vector duplicates.
	(simplify_binary_operation_1): Likewise.  Handle VEC_SELECTs of
	vector duplicates.
	(simplify_subreg): Handle subregs of vector duplicates.
	(make_test_reg, test_vector_ops_duplicate, test_vector_ops)
	(selftest::simplify_rtx_c_tests): New functions.

Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	2017-09-21 11:53:16.490837253 +0100
+++ gcc/rtl.h	2017-09-23 09:14:44.753611808 +0100
@@ -2772,6 +2772,21 @@ const_vec_duplicate_p (T x, T *elt)
   return false;
 }
 
+/* Return true if X is a vector with a duplicated element value, either
+   constant or nonconstant.  Store the duplicated element in *ELT if so.  */
+
+template <typename T>
+inline bool
+vec_duplicate_p (T x, T *elt)
+{
+  if (GET_CODE (x) == VEC_DUPLICATE)
+    {
+      *elt = XEXP (x, 0);
+      return true;
+    }
+  return const_vec_duplicate_p (x, elt);
+}
+
 /* If X is a vector constant with a duplicated element value, return that
    element value, otherwise return X.  */
 
Index: gcc/selftest-rtl.c
===================================================================
--- gcc/selftest-rtl.c	2017-02-23 19:54:03.000000000 +0000
+++ gcc/selftest-rtl.c	2017-09-23 09:14:44.753611808 +0100
@@ -35,6 +35,29 @@ Software Foundation; either version 3, o
 
 namespace selftest {
 
+/* Compare rtx EXPECTED and ACTUAL using rtx_equal_p, calling
+   ::selftest::pass if they are equal, aborting if they are non-equal.
+   LOC is the effective location of the assertion, MSG describes it.  */
+
+void
+assert_rtx_eq_at (const location &loc, const char *msg,
+		  rtx expected, rtx actual)
+{
+  if (rtx_equal_p (expected, actual))
+    ::selftest::pass (loc, msg);
+  else
+    {
+      fprintf (stderr, "%s:%i: %s: FAIL: %s\n", loc.m_file, loc.m_line,
+	       loc.m_function, msg);
+      fprintf (stderr, "  expected: ");
+      print_rtl (stderr, expected);
+      fprintf (stderr, "\n  actual: ");
+      print_rtl (stderr, actual);
+      fprintf (stderr, "\n");
+      abort ();
+    }
+}
+
 /* Compare rtx EXPECTED and ACTUAL by pointer equality, calling
    ::selftest::pass if they are equal, aborting if they are non-equal.
    LOC is the effective location of the assertion, MSG describes it.  */
Index: gcc/selftest-rtl.h
===================================================================
--- gcc/selftest-rtl.h	2017-02-23 19:54:03.000000000 +0000
+++ gcc/selftest-rtl.h	2017-09-23 09:14:44.753611808 +0100
@@ -47,6 +47,15 @@ #define ASSERT_RTL_DUMP_EQ_WITH_REUSE(EX
   assert_rtl_dump_eq (SELFTEST_LOCATION, (EXPECTED_DUMP), (RTX), \
 		      (REUSE_MANAGER))
 
+#define ASSERT_RTX_EQ(EXPECTED, ACTUAL) 				\
+  SELFTEST_BEGIN_STMT							\
+  const char *desc = "ASSERT_RTX_EQ (" #EXPECTED ", " #ACTUAL ")";	\
+  ::selftest::assert_rtx_eq_at (SELFTEST_LOCATION, desc, (EXPECTED),	\
+				(ACTUAL));				\
+  SELFTEST_END_STMT
+
+extern void assert_rtx_eq_at (const location &, const char *, rtx, rtx);
+
 /* Evaluate rtx EXPECTED and ACTUAL and compare them with ==
    (i.e. pointer equality), calling ::selftest::pass if they are
    equal, aborting if they are non-equal.  */
Index: gcc/selftest.h
===================================================================
--- gcc/selftest.h	2017-06-12 17:05:23.270759359 +0100
+++ gcc/selftest.h	2017-09-23 09:14:44.754563641 +0100
@@ -197,6 +197,7 @@ extern void tree_cfg_c_tests ();
 extern void vec_c_tests ();
 extern void wide_int_cc_tests ();
 extern void predict_c_tests ();
+extern void simplify_rtx_c_tests ();
 
 extern int num_passes;
 
Index: gcc/selftest-run-tests.c
===================================================================
--- gcc/selftest-run-tests.c	2017-06-12 17:05:23.505761496 +0100
+++ gcc/selftest-run-tests.c	2017-09-23 09:14:44.754563641 +0100
@@ -93,6 +93,7 @@ selftest::run_tests ()
 
   store_merging_c_tests ();
   predict_c_tests ();
+  simplify_rtx_c_tests ();
 
   /* Run any lang-specific selftests.  */
   lang_hooks.run_lang_selftests ();
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2017-09-23 08:56:06.723183240 +0100
+++ gcc/simplify-rtx.c	2017-09-23 09:14:44.755515474 +0100
@@ -33,6 +33,8 @@ Software Foundation; either version 3, o
 #include "diagnostic-core.h"
 #include "varasm.h"
 #include "flags.h"
+#include "selftest.h"
+#include "selftest-rtl.h"
 
 /* Simplification and canonicalization of RTL.  */
 
@@ -925,7 +927,7 @@ exact_int_to_float_conversion_p (const_r
 simplify_unary_operation_1 (enum rtx_code code, machine_mode mode, rtx op)
 {
   enum rtx_code reversed;
-  rtx temp;
+  rtx temp, elt;
   scalar_int_mode inner, int_mode, op_mode, op0_mode;
 
   switch (code)
@@ -1681,6 +1683,28 @@ simplify_unary_operation_1 (enum rtx_cod
       break;
     }
 
+  if (VECTOR_MODE_P (mode) && vec_duplicate_p (op, &elt))
+    {
+      /* Try applying the operator to ELT and see if that simplifies.
+	 We can duplicate the result if so.
+
+	 The reason we don't use simplify_gen_unary is that it isn't
+	 necessarily a win to convert things like:
+
+	   (neg:V (vec_duplicate:V (reg:S R)))
+
+	 to:
+
+	   (vec_duplicate:V (neg:S (reg:S R)))
+
+	 The first might be done entirely in vector registers while the
+	 second might need a move between register files.  */
+      temp = simplify_unary_operation (code, GET_MODE_INNER (mode),
+				       elt, GET_MODE_INNER (GET_MODE (op)));
+      if (temp)
+	return gen_vec_duplicate (mode, temp);
+    }
+
   return 0;
 }
 
@@ -2138,7 +2162,7 @@ simplify_binary_operation (enum rtx_code
 simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 			     rtx op0, rtx op1, rtx trueop0, rtx trueop1)
 {
-  rtx tem, reversed, opleft, opright;
+  rtx tem, reversed, opleft, opright, elt0, elt1;
   HOST_WIDE_INT val;
   unsigned int width = GET_MODE_PRECISION (mode);
   scalar_int_mode int_mode, inner_mode;
@@ -3505,6 +3529,9 @@ simplify_binary_operation_1 (enum rtx_co
 	  gcc_assert (XVECLEN (trueop1, 0) == 1);
 	  gcc_assert (CONST_INT_P (XVECEXP (trueop1, 0, 0)));
 
+	  if (vec_duplicate_p (trueop0, &elt0))
+	    return elt0;
+
 	  if (GET_CODE (trueop0) == CONST_VECTOR)
 	    return CONST_VECTOR_ELT (trueop0, INTVAL (XVECEXP
 						      (trueop1, 0, 0)));
@@ -3587,9 +3614,6 @@ simplify_binary_operation_1 (enum rtx_co
 				    tmp_op, gen_rtx_PARALLEL (VOIDmode, vec));
 	      return tmp;
 	    }
-	  if (GET_CODE (trueop0) == VEC_DUPLICATE
-	      && GET_MODE (XEXP (trueop0, 0)) == mode)
-	    return XEXP (trueop0, 0);
 	}
       else
 	{
@@ -3598,6 +3622,11 @@ simplify_binary_operation_1 (enum rtx_co
 		      == GET_MODE_INNER (GET_MODE (trueop0)));
 	  gcc_assert (GET_CODE (trueop1) == PARALLEL);
 
+	  if (vec_duplicate_p (trueop0, &elt0))
+	    /* It doesn't matter which elements are selected by trueop1,
+	       because they are all the same.  */
+	    return gen_vec_duplicate (mode, elt0);
+
 	  if (GET_CODE (trueop0) == CONST_VECTOR)
 	    {
 	      int elt_size = GET_MODE_UNIT_SIZE (mode);
@@ -3898,6 +3927,32 @@ simplify_binary_operation_1 (enum rtx_co
       gcc_unreachable ();
     }
 
+  if (mode == GET_MODE (op0)
+      && mode == GET_MODE (op1)
+      && vec_duplicate_p (op0, &elt0)
+      && vec_duplicate_p (op1, &elt1))
+    {
+      /* Try applying the operator to ELT and see if that simplifies.
+	 We can duplicate the result if so.
+
+	 The reason we don't use simplify_gen_binary is that it isn't
+	 necessarily a win to convert things like:
+
+	   (plus:V (vec_duplicate:V (reg:S R1))
+		   (vec_duplicate:V (reg:S R2)))
+
+	 to:
+
+	   (vec_duplicate:V (plus:S (reg:S R1) (reg:S R2)))
+
+	 The first might be done entirely in vector registers while the
+	 second might need a move between register files.  */
+      tem = simplify_binary_operation (code, GET_MODE_INNER (mode),
+				       elt0, elt1);
+      if (tem)
+	return gen_vec_duplicate (mode, tem);
+    }
+
   return 0;
 }
 
@@ -6046,6 +6101,20 @@ simplify_subreg (machine_mode outermode,
   if (outermode == innermode && !byte)
     return op;
 
+  if (byte % GET_MODE_UNIT_SIZE (innermode) == 0)
+    {
+      rtx elt;
+
+      if (VECTOR_MODE_P (outermode)
+	  && GET_MODE_INNER (outermode) == GET_MODE_INNER (innermode)
+	  && vec_duplicate_p (op, &elt))
+	return gen_vec_duplicate (outermode, elt);
+
+      if (outermode == GET_MODE_INNER (innermode)
+	  && vec_duplicate_p (op, &elt))
+	return elt;
+    }
+
   if (CONST_SCALAR_INT_P (op)
       || CONST_DOUBLE_AS_FLOAT_P (op)
       || GET_CODE (op) == CONST_FIXED
@@ -6351,3 +6420,125 @@ simplify_rtx (const_rtx x)
     }
   return NULL;
 }
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* Make a unique pseudo REG of mode MODE for use by selftests.  */
+
+static rtx
+make_test_reg (machine_mode mode)
+{
+  static int test_reg_num = LAST_VIRTUAL_REGISTER + 1;
+
+  return gen_rtx_REG (mode, test_reg_num++);
+}
+
+/* Test vector simplifications in which the operands and result have
+   vector mode MODE.  SCALAR_REG is a pseudo register that holds one
+   element of MODE.  */
+
+static void
+test_vector_ops_duplicate (machine_mode mode, rtx scalar_reg)
+{
+  scalar_mode inner_mode = GET_MODE_INNER (mode);
+  rtx duplicate = gen_rtx_VEC_DUPLICATE (mode, scalar_reg);
+  unsigned int nunits = GET_MODE_NUNITS (mode);
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+    {
+      /* Test some simple unary cases with VEC_DUPLICATE arguments.  */
+      rtx not_scalar_reg = gen_rtx_NOT (inner_mode, scalar_reg);
+      rtx duplicate_not = gen_rtx_VEC_DUPLICATE (mode, not_scalar_reg);
+      ASSERT_RTX_EQ (duplicate,
+		     simplify_unary_operation (NOT, mode,
+					       duplicate_not, mode));
+
+      rtx neg_scalar_reg = gen_rtx_NEG (inner_mode, scalar_reg);
+      rtx duplicate_neg = gen_rtx_VEC_DUPLICATE (mode, neg_scalar_reg);
+      ASSERT_RTX_EQ (duplicate,
+		     simplify_unary_operation (NEG, mode,
+					       duplicate_neg, mode));
+
+      /* Test some simple binary cases with VEC_DUPLICATE arguments.  */
+      ASSERT_RTX_EQ (duplicate,
+		     simplify_binary_operation (PLUS, mode, duplicate,
+						CONST0_RTX (mode)));
+
+      ASSERT_RTX_EQ (duplicate,
+		     simplify_binary_operation (MINUS, mode, duplicate,
+						CONST0_RTX (mode)));
+
+      ASSERT_RTX_PTR_EQ (CONST0_RTX (mode),
+			 simplify_binary_operation (MINUS, mode, duplicate,
+						    duplicate));
+    }
+
+  /* Test a scalar VEC_SELECT of a VEC_DUPLICATE.  */
+  rtx zero_par = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const0_rtx));
+  ASSERT_RTX_PTR_EQ (scalar_reg,
+		     simplify_binary_operation (VEC_SELECT, inner_mode,
+						duplicate, zero_par));
+
+  /* And again with the final element.  */
+  rtx last_index = gen_int_mode (GET_MODE_NUNITS (mode) - 1, word_mode);
+  rtx last_par = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, last_index));
+  ASSERT_RTX_PTR_EQ (scalar_reg,
+		     simplify_binary_operation (VEC_SELECT, inner_mode,
+						duplicate, last_par));
+
+  /* Test a scalar subreg of a VEC_DUPLICATE.  */
+  unsigned int offset = subreg_lowpart_offset (inner_mode, mode);
+  ASSERT_RTX_EQ (scalar_reg,
+		 simplify_gen_subreg (inner_mode, duplicate,
+				      mode, offset));
+
+  machine_mode narrower_mode;
+  if (nunits > 2
+      && mode_for_vector (inner_mode, 2).exists (&narrower_mode)
+      && VECTOR_MODE_P (narrower_mode))
+    {
+      /* Test VEC_SELECT of a vector.  */
+      rtx vec_par
+	= gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, const1_rtx, const0_rtx));
+      rtx narrower_duplicate
+	= gen_rtx_VEC_DUPLICATE (narrower_mode, scalar_reg);
+      ASSERT_RTX_EQ (narrower_duplicate,
+		     simplify_binary_operation (VEC_SELECT, narrower_mode,
+						duplicate, vec_par));
+
+      /* Test a vector subreg of a VEC_DUPLICATE.  */
+      unsigned int offset = subreg_lowpart_offset (narrower_mode, mode);
+      ASSERT_RTX_EQ (narrower_duplicate,
+		     simplify_gen_subreg (narrower_mode, duplicate,
+					  mode, offset));
+    }
+}
+
+/* Verify some simplifications involving vectors.  */
+
+static void
+test_vector_ops ()
+{
+  for (unsigned int i = 0; i < NUM_MACHINE_MODES; ++i)
+    {
+      machine_mode mode = (machine_mode) i;
+      if (VECTOR_MODE_P (mode))
+	{
+	  rtx scalar_reg = make_test_reg (GET_MODE_INNER (mode));
+	  test_vector_ops_duplicate (mode, scalar_reg);
+	}
+    }
+}
+
+/* Run all of the selftests within this file.  */
+
+void
+simplify_rtx_c_tests ()
+{
+  test_vector_ops ();
+}
+
+} // namespace selftest
+
+#endif /* CHECKING_P */