From patchwork Wed May 1 05:05:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 163194 Delivered-To: patch@linaro.org Received: by 2002:a92:7e86:0:0:0:0:0 with SMTP id q6csp4129373ill; Tue, 30 Apr 2019 22:19:20 -0700 (PDT) X-Google-Smtp-Source: APXvYqww1wl3j/JGOh+/UO3Pa4jMLgGMKyikNkDrxkOh8yRhR8zAgHfEgGDKxdOUy+RJDmPwOFv2 X-Received: by 2002:adf:bac9:: with SMTP id w9mr22999608wrg.256.1556687960712; Tue, 30 Apr 2019 22:19:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556687960; cv=none; d=google.com; s=arc-20160816; b=H8jQKLD3rUL2xmSkypL0nQV81RiUDygXX1ZBTyBRxoLl2sV9ZP5zERwJPqi/uxGKBt aTa7/BVV9w5Mowa/aVPVFKjWSbab5LlLrAHBZYVdHs/XxPOqWe4IBHlvTC+eC8arH5lz bB5dpoGxUotUNhYO34MddV2BtlqTAEugL6b1XBGGmCxwus2lbYwLu1UQdhB4tAOdUqJS ZZjg1Q9TbGOazU2hskwBBQN1b8sWx+719ED/vo5mKTHUjh/bB1+PpuLutBmzA0yunm4N TfibBsTHsC+UqXzDJyJhC/COlMJR1j0Z2/6Z00Ivci18psnSjICbO96IIyq1ld5l1/B7 gpxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature; bh=BXcUne31laQN3ojMHviXYqPdeIZ3QTZ/IUq6drGJC3A=; b=PwJCmy/1Ikrpy/u3XHVjcd62DO+UbZbrhm4pBlLzsAYWI2C7eOVIz7eh14hSBCo9Ig Bl9o7Mg4Fdd4BFnYn1HALV69tc0L8iTyi6wAoDGQSpGkQ+rEpDEOD3TjVSkzhOd9hc31 QBtaoi1zIro/lnJAWmItIbBdHEgpyG/2e5D/r7uOxs9fAARGWd0D1E2E4RgUwGGmYvi7 dCUWbi5v1y4VAuGWftaSzAULt+NDjjE0AQ/f5N3v/QLfakbSprrBl9WVFnf6CW2aj32O ezrbaSUwssxTNgGYXgC3Scg7Yjk9Ea8CB13EEsgLJxxVRb0tt2Yx5e24hooWVrM6y+Ag LSMQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=X7TD5qTB; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id a8si11739541wrr.413.2019.04.30.22.19.20 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 30 Apr 2019 22:19:20 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=X7TD5qTB; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([127.0.0.1]:36389 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hLheJ-0000l8-Hf for patch@linaro.org; Wed, 01 May 2019 01:19:19 -0400 Received: from eggs.gnu.org ([209.51.188.92]:38353) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hLhRV-0006Xa-KT for qemu-devel@nongnu.org; Wed, 01 May 2019 01:06:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hLhRR-0002uh-9N for qemu-devel@nongnu.org; Wed, 01 May 2019 01:06:05 -0400 Received: from mail-pl1-x642.google.com ([2607:f8b0:4864:20::642]:42932) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hLhRQ-0002uA-T2 for qemu-devel@nongnu.org; Wed, 01 May 2019 01:06:01 -0400 Received: by mail-pl1-x642.google.com with SMTP id x15so7756045pln.9 for ; Tue, 30 Apr 2019 22:06:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=BXcUne31laQN3ojMHviXYqPdeIZ3QTZ/IUq6drGJC3A=; b=X7TD5qTB6/gQ7LJdj1+prCLCeYeV0bBAZHOvKv7Na8vjQ1l5r6bUf7jOaL5I3Rq0A6 blCeLc/ibunxCrYzJs8KXkDze2dHQ9pi9BwjIhZKm5JqyzcYpL1wffG8OvBgieqOEyKE eCI6hAiKDelpxZ3rakBhRFGNKq8pKEht89qqeLWle8bCTu7BhMc+IFFhEF1QjbDxjrVE ekRWf39J0iH4g8wDhGMf7vEkw+JbKGkEGwMfCN6AevQO6XfXmrJBd6XhBt/8BxDVk2M9 ckmhYgXfUWJ3vk1jD++EESORXuWzbhda3A9X91gZj06/wsWl7853orzRm5s+B3PPzMPS GlnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=BXcUne31laQN3ojMHviXYqPdeIZ3QTZ/IUq6drGJC3A=; b=Wz1Be9IQPLXqu61arZNO+MSu88DP1EpemWC8Cfi4KDoy6BpAb5sIbmXExQ1TH4b4iO 343Cwk9PizZ6477KShb5M2Iyo6bPswkSRo6CIEexrynslB43/wbBwG0iTQaLWtHFvk8r cIJDG0WODenJE066PeFeaXIlJYe467l9KFqips0DZDkt9pmWKCjn7At582xTeOl234em y0ReEFQXk6wroyHWcvJyTCn+AKPWwGju4wCQg7hsGh6R62RCP4vwrLk3wbyYi5vEppfC CSN5bUP3FWsTd3QaIH3l6oJEfpwtCsqLkU5UuhHoZO9zu53BJBnA11JvYtuwOqUpdVFx bSUw== X-Gm-Message-State: APjAAAXf+Cj47hY5cUGGqmi/+qW9NLY7MT8J1PZVWnuxmQZxv71hWgpd VDOvTa11jBjOV0FeHlMu2hUbwYVVQG0= X-Received: by 2002:a17:902:2cc1:: with SMTP id n59mr12173999plb.22.1556687159667; Tue, 30 Apr 2019 22:05:59 -0700 (PDT) Received: from localhost.localdomain (97-113-189-189.tukw.qwest.net. [97.113.189.189]) by smtp.gmail.com with ESMTPSA id t127sm9687251pfb.106.2019.04.30.22.05.58 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Apr 2019 22:05:59 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 30 Apr 2019 22:05:20 -0700 Message-Id: <20190501050536.15580-14-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190501050536.15580-1-richard.henderson@linaro.org> References: <20190501050536.15580-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::642 Subject: [Qemu-devel] [PATCH v2 13/29] tcg: Add INDEX_op_dup_mem_vec X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Allow the backend to expand dup from memory directly, instead of forcing the value into a temp first. This is especially important if integer/vector register moves do not exist. Note that officially tcg_out_dupm_vec is allowed to fail. If it did, we could fix this up relatively easily: VECE == 32/64: Load the value into a vector register, then dup. Both of these must work. VECE == 8/16: If the value happens to be at an offset such that an aligned load would place the desired value in the least significant end of the register, go ahead and load w/garbage in high bits. Load the value w/INDEX_op_ld{8,16}_i32. Attempt a move directly to vector reg, which may fail. Store the value into the backing store for OTS. Load the value into the vector reg w/TCG_TYPE_I32, which must work. Duplicate from the vector reg into itself, which must work. All of which is well and good, except that all supported hosts can support dupm for all vece, so all of the failure paths would be dead code and untestable. Signed-off-by: Richard Henderson --- tcg/tcg-op.h | 1 + tcg/tcg-opc.h | 1 + tcg/aarch64/tcg-target.inc.c | 4 ++ tcg/i386/tcg-target.inc.c | 4 ++ tcg/tcg-op-gvec.c | 89 +++++++++++++++++++----------------- tcg/tcg-op-vec.c | 11 +++++ tcg/tcg.c | 1 + 7 files changed, 70 insertions(+), 41 deletions(-) -- 2.17.1 Reviewed-by: Alex Bennée diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index 1f1824c30a..9fff9864f6 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -954,6 +954,7 @@ void tcg_gen_atomic_umax_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp); void tcg_gen_mov_vec(TCGv_vec, TCGv_vec); void tcg_gen_dup_i32_vec(unsigned vece, TCGv_vec, TCGv_i32); void tcg_gen_dup_i64_vec(unsigned vece, TCGv_vec, TCGv_i64); +void tcg_gen_dup_mem_vec(unsigned vece, TCGv_vec, TCGv_ptr, tcg_target_long); void tcg_gen_dup8i_vec(TCGv_vec, uint32_t); void tcg_gen_dup16i_vec(TCGv_vec, uint32_t); void tcg_gen_dup32i_vec(TCGv_vec, uint32_t); diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index 1bad6e4208..4bf71f261f 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -219,6 +219,7 @@ DEF(dup2_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_REG_BITS == 32)) DEF(ld_vec, 1, 1, 1, IMPLVEC) DEF(st_vec, 0, 2, 1, IMPLVEC) +DEF(dupm_vec, 1, 1, 1, IMPLVEC) DEF(add_vec, 1, 2, 0, IMPLVEC) DEF(sub_vec, 1, 2, 0, IMPLVEC) diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c index 411fb463ac..16381f5175 100644 --- a/tcg/aarch64/tcg-target.inc.c +++ b/tcg/aarch64/tcg-target.inc.c @@ -2192,6 +2192,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_st_vec: tcg_out_st(s, type, a0, a1, a2); break; + case INDEX_op_dupm_vec: + tcg_out_dupm_vec(s, type, vece, a0, a1, a2); + break; case INDEX_op_add_vec: tcg_out_insn(s, 3616, ADD, is_q, vece, a0, a1, a2); break; @@ -2524,6 +2527,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) return &w_w; case INDEX_op_ld_vec: case INDEX_op_st_vec: + case INDEX_op_dupm_vec: return &w_r; case INDEX_op_dup_vec: return &w_wr; diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c index f4bd00e24f..5b33bbd99b 100644 --- a/tcg/i386/tcg-target.inc.c +++ b/tcg/i386/tcg-target.inc.c @@ -2829,6 +2829,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_st_vec: tcg_out_st(s, type, a0, a1, a2); break; + case INDEX_op_dupm_vec: + tcg_out_dupm_vec(s, type, vece, a0, a1, a2); + break; case INDEX_op_x86_shufps_vec: insn = OPC_SHUFPS; @@ -3115,6 +3118,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) case INDEX_op_ld_vec: case INDEX_op_st_vec: + case INDEX_op_dupm_vec: return &x_r; case INDEX_op_add_vec: diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index 3fcb2352d9..35ebc5a201 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -395,6 +395,41 @@ static TCGType choose_vector_type(const TCGOpcode *list, unsigned vece, return 0; } +static void do_dup_store(TCGType type, uint32_t dofs, uint32_t oprsz, + uint32_t maxsz, TCGv_vec t_vec) +{ + uint32_t i = 0; + + switch (type) { + case TCG_TYPE_V256: + /* + * Recall that ARM SVE allows vector sizes that are not a + * power of 2, but always a multiple of 16. The intent is + * that e.g. size == 80 would be expanded with 2x32 + 1x16. + */ + for (; i + 32 <= oprsz; i += 32) { + tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V256); + } + /* fallthru */ + case TCG_TYPE_V128: + for (; i + 16 <= oprsz; i += 16) { + tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V128); + } + break; + case TCG_TYPE_V64: + for (; i < oprsz; i += 8) { + tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V64); + } + break; + default: + g_assert_not_reached(); + } + + if (oprsz < maxsz) { + expand_clr(dofs + oprsz, maxsz - oprsz); + } +} + /* Set OPRSZ bytes at DOFS to replications of IN_32, IN_64 or IN_C. * Only one of IN_32 or IN_64 may be set; * IN_C is used if IN_32 and IN_64 are unset. @@ -434,49 +469,11 @@ static void do_dup(unsigned vece, uint32_t dofs, uint32_t oprsz, } else if (in_64) { tcg_gen_dup_i64_vec(vece, t_vec, in_64); } else { - switch (vece) { - case MO_8: - tcg_gen_dup8i_vec(t_vec, in_c); - break; - case MO_16: - tcg_gen_dup16i_vec(t_vec, in_c); - break; - case MO_32: - tcg_gen_dup32i_vec(t_vec, in_c); - break; - default: - tcg_gen_dup64i_vec(t_vec, in_c); - break; - } + tcg_gen_dupi_vec(vece, t_vec, in_c); } - - i = 0; - switch (type) { - case TCG_TYPE_V256: - /* Recall that ARM SVE allows vector sizes that are not a - * power of 2, but always a multiple of 16. The intent is - * that e.g. size == 80 would be expanded with 2x32 + 1x16. - */ - for (; i + 32 <= oprsz; i += 32) { - tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V256); - } - /* fallthru */ - case TCG_TYPE_V128: - for (; i + 16 <= oprsz; i += 16) { - tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V128); - } - break; - case TCG_TYPE_V64: - for (; i < oprsz; i += 8) { - tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V64); - } - break; - default: - g_assert_not_reached(); - } - + do_dup_store(type, dofs, oprsz, maxsz, t_vec); tcg_temp_free_vec(t_vec); - goto done; + return; } /* Otherwise, inline with an integer type, unless "large". */ @@ -1449,6 +1446,16 @@ void tcg_gen_gvec_dup_i64(unsigned vece, uint32_t dofs, uint32_t oprsz, void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t oprsz, uint32_t maxsz) { + if (vece <= MO_64) { + TCGType type = choose_vector_type(0, vece, oprsz, 0); + if (type != 0) { + TCGv_vec t_vec = tcg_temp_new_vec(type); + tcg_gen_dup_mem_vec(vece, t_vec, cpu_env, aofs); + do_dup_store(type, dofs, oprsz, maxsz, t_vec); + tcg_temp_free_vec(t_vec); + return; + } + } if (vece <= MO_32) { TCGv_i32 in = tcg_temp_new_i32(); switch (vece) { diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c index 914fe42b1e..213d2e22aa 100644 --- a/tcg/tcg-op-vec.c +++ b/tcg/tcg-op-vec.c @@ -278,6 +278,17 @@ void tcg_gen_dup_i32_vec(unsigned vece, TCGv_vec r, TCGv_i32 a) vec_gen_2(INDEX_op_dup_vec, type, vece, ri, ai); } +void tcg_gen_dup_mem_vec(unsigned vece, TCGv_vec r, TCGv_ptr b, + tcg_target_long ofs) +{ + TCGArg ri = tcgv_vec_arg(r); + TCGArg bi = tcgv_ptr_arg(b); + TCGTemp *rt = arg_temp(ri); + TCGType type = rt->base_type; + + vec_gen_3(INDEX_op_dupm_vec, type, vece, ri, bi, ofs); +} + static void vec_gen_ldst(TCGOpcode opc, TCGv_vec r, TCGv_ptr b, TCGArg o) { TCGArg ri = tcgv_vec_arg(r); diff --git a/tcg/tcg.c b/tcg/tcg.c index ef01cf8214..bb1e124e80 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1600,6 +1600,7 @@ bool tcg_op_supported(TCGOpcode op) case INDEX_op_mov_vec: case INDEX_op_dup_vec: case INDEX_op_dupi_vec: + case INDEX_op_dupm_vec: case INDEX_op_ld_vec: case INDEX_op_st_vec: case INDEX_op_add_vec: