From patchwork Mon Dec 18 17:17:33 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122249 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3106492qgn; Mon, 18 Dec 2017 09:25:22 -0800 (PST) X-Google-Smtp-Source: ACJfBov+wilwcZRtBQ8uhdSRKvG+xnCkclxdFCKeYVI5b2pTNfBjUw99PkJp7aBESJmEFOaePvyH X-Received: by 10.37.30.67 with SMTP id e64mr430362ybe.103.1513617922718; Mon, 18 Dec 2017 09:25:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513617922; cv=none; d=google.com; s=arc-20160816; b=AKOOMo5iUf/93By58hut9e1E2XQyq+THXgWT9cyiuzeCl2pa4StFRi5fjBs1DS8pF6 mmX30+4Ikmoc1Zub1Q4BQzAEip/SVUHKyUSc2U+c3Pd+s7sSxkVhJExVxhIAaIGed+e/ ivVT4KzaW9sxCHL7flz7namviNK7tgZLv3Jj0VOTlYq9oGSlDEjLANIXw9DVJT9MxWgA Z330eEEc02j/E8bNKBryyDGFHzsCnEu/I415Z9JeCYxfiDjDV1QNu96Z+n0/8Yju3vd2 Ea96EyJCKvmuTBUOTPL8l022Q6N4nyiut64y+w3UU2b+y3nx/sk5tXEksZiHHC4Yj6Mx kWuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=3ATsLciIpK6MJtUY82ncrcWUkwDTu7ZiPShPSdCMTh0=; b=ANCNm0sbyr0wUj8+dnVnBbUnP3mvGP284+jK4jmz42okDXLyHqrJUe29T7K+Cy17nD VUf8hTzr6z1OsJI/akACQNnWycjY66bpK4S3fEzq4uI6NSV+d8X57Q2bcAgfg++IN4Rb 4BgtI/9zEfJuA/34AEMJSlu2N9UN3qK79Uw310izLLYyb/1mchVsF4h0CoVsO1dCNWg2 hO9Hc2TZMYElTmSppI3l4ZcQwgxzGN9tkdmjz25XMztDQlKiGF+HbQTdSEbSvFsbKW+A uZkCnJVgh8QKCgjzE4lq6mPQ2VtooeO5HCyMuINn0ZfqcLEObCNXCX3C6ud2IDO1HRMm J8AA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=OE/F8sap; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id m187si2459141ywc.323.2017.12.18.09.25.22 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:25:22 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=OE/F8sap; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58564 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzAI-0006Go-3X for patch@linaro.org; Mon, 18 Dec 2017 12:25:22 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36935) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3S-0000Nf-Ng for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:27 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3E-0000hq-GC for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:18 -0500 Received: from mail-pf0-x241.google.com ([2607:f8b0:400e:c00::241]:40459) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3E-0000gQ-68 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:04 -0500 Received: by mail-pf0-x241.google.com with SMTP id v26so9936679pfl.7 for ; Mon, 18 Dec 2017 09:18:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=3ATsLciIpK6MJtUY82ncrcWUkwDTu7ZiPShPSdCMTh0=; b=OE/F8sap+FBeTPJPO/gtW4phlY6nVdrHD/UYsxZLESbhyZIwmYDJCFhm+7yzELZq6p r3nYhOw2pSZYfA+QKWs+X5iCEVeTEg0wbOeAiYlPXcodIdP05EvCgUy2kLeg1JMndfpM 1wFtj6UXsThlgPPDkunrlweT00EgaYDu8azFE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=3ATsLciIpK6MJtUY82ncrcWUkwDTu7ZiPShPSdCMTh0=; b=g9D/0/Ej8vzqw6pFs4YNX0tfueSTmldWG5ZBWMQ0HIuXi4eXmU5iXXl3bAU06HTQGi GKS6uIXrDuPFbNknHpur5uH/3G25eDjkC7lnkaIZkqao1a9vFbanTWB060m1qvLYs8ta eViarKSOhukpZk/NymnETZCgiyvuJhHmAuCj3naD8GPZOX/RyFo7h3qDGxfyChMCEs8G NB2PbesxqFny+ZKqS0VKXzRPa+UhRi55iMlHdQH2sdIZmHSfBkS9+pkUuPImgVDjxbPU UlxqkiGb8U2skU2ZN0KHNUGPC28YfIeSX61hmBBpZKq9GcmEl+q25cl5KAXs/UtwL0Zr 3Aqw== X-Gm-Message-State: AKGB3mLQ9STeJYtU6u54Kq2GRc+rKRYAzl7SlwXLh5IIFVZlplL5MXAp BpeySiVu/kIy1x7WzZ7j0erB12v6Z3g= X-Received: by 10.101.83.67 with SMTP id w3mr375304pgr.322.1513617482562; Mon, 18 Dec 2017 09:18:02 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.01 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:01 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:33 -0800 Message-Id: <20171218171758.16964-2-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::241 Subject: [Qemu-devel] [PATCH v7 01/26] tcg: Add types and basic operations for host vectors X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Nothing uses or enables them yet. Signed-off-by: Richard Henderson --- Makefile.target | 4 +- tcg/tcg-op.h | 30 +++++ tcg/tcg-opc.h | 26 ++++ tcg/tcg.h | 56 +++++++++ tcg/tcg-op-vec.c | 362 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ tcg/tcg.c | 100 ++++++++++++++- tcg/README | 58 +++++++++ 7 files changed, 630 insertions(+), 6 deletions(-) create mode 100644 tcg/tcg-op-vec.c -- 2.14.3 diff --git a/Makefile.target b/Makefile.target index f9a9da7e7c..7f30a1e725 100644 --- a/Makefile.target +++ b/Makefile.target @@ -93,8 +93,8 @@ all: $(PROGS) stap # cpu emulator library obj-y += exec.o obj-y += accel/ -obj-$(CONFIG_TCG) += tcg/tcg.o tcg/tcg-op.o tcg/optimize.o -obj-$(CONFIG_TCG) += tcg/tcg-common.o +obj-$(CONFIG_TCG) += tcg/tcg.o tcg/tcg-op.o tcg/tcg-op-vec.o +obj-$(CONFIG_TCG) += tcg/tcg-common.o tcg/optimize.o obj-$(CONFIG_TCG_INTERPRETER) += tcg/tci.o obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o obj-y += fpu/softfloat.o diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index ca07b32b65..9b0560e4d3 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -35,6 +35,10 @@ void tcg_gen_op4(TCGOpcode, TCGArg, TCGArg, TCGArg, TCGArg); void tcg_gen_op5(TCGOpcode, TCGArg, TCGArg, TCGArg, TCGArg, TCGArg); void tcg_gen_op6(TCGOpcode, TCGArg, TCGArg, TCGArg, TCGArg, TCGArg, TCGArg); +void vec_gen_2(TCGOpcode, TCGType, unsigned, TCGArg, TCGArg); +void vec_gen_3(TCGOpcode, TCGType, unsigned, TCGArg, TCGArg, TCGArg); +void vec_gen_4(TCGOpcode, TCGType, unsigned, TCGArg, TCGArg, TCGArg, TCGArg); + static inline void tcg_gen_op1_i32(TCGOpcode opc, TCGv_i32 a1) { tcg_gen_op1(opc, tcgv_i32_arg(a1)); @@ -903,6 +907,30 @@ void tcg_gen_atomic_or_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp); void tcg_gen_atomic_xor_fetch_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp); void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp); +void tcg_gen_mov_vec(TCGv_vec, TCGv_vec); +void tcg_gen_dup_i32_vec(unsigned vece, TCGv_vec, TCGv_i32); +void tcg_gen_dup_i64_vec(unsigned vece, TCGv_vec, TCGv_i64); +void tcg_gen_dup8i_vec(TCGv_vec, uint32_t); +void tcg_gen_dup16i_vec(TCGv_vec, uint32_t); +void tcg_gen_dup32i_vec(TCGv_vec, uint32_t); +void tcg_gen_dup64i_vec(TCGv_vec, uint64_t); +void tcg_gen_movi_v64(TCGv_vec, uint64_t); +void tcg_gen_movi_v128(TCGv_vec, uint64_t, uint64_t); +void tcg_gen_movi_v256(TCGv_vec, uint64_t, uint64_t, uint64_t, uint64_t); +void tcg_gen_add_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_sub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_and_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_or_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_xor_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_andc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_orc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a); +void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a); + +void tcg_gen_ld_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset); +void tcg_gen_st_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset); +void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t); + #if TARGET_LONG_BITS == 64 #define tcg_gen_movi_tl tcg_gen_movi_i64 #define tcg_gen_mov_tl tcg_gen_mov_i64 @@ -1001,6 +1029,7 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp); #define tcg_gen_atomic_and_fetch_tl tcg_gen_atomic_and_fetch_i64 #define tcg_gen_atomic_or_fetch_tl tcg_gen_atomic_or_fetch_i64 #define tcg_gen_atomic_xor_fetch_tl tcg_gen_atomic_xor_fetch_i64 +#define tcg_gen_dup_tl_vec tcg_gen_dup_i64_vec #else #define tcg_gen_movi_tl tcg_gen_movi_i32 #define tcg_gen_mov_tl tcg_gen_mov_i32 @@ -1098,6 +1127,7 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp); #define tcg_gen_atomic_and_fetch_tl tcg_gen_atomic_and_fetch_i32 #define tcg_gen_atomic_or_fetch_tl tcg_gen_atomic_or_fetch_i32 #define tcg_gen_atomic_xor_fetch_tl tcg_gen_atomic_xor_fetch_i32 +#define tcg_gen_dup_tl_vec tcg_gen_dup_i32_vec #endif #if UINTPTR_MAX == UINT32_MAX diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index 956fb1e9f3..4e62eda14b 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -204,8 +204,34 @@ DEF(qemu_ld_i64, DATA64_ARGS, TLADDR_ARGS, 1, DEF(qemu_st_i64, 0, TLADDR_ARGS + DATA64_ARGS, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS | TCG_OPF_64BIT) +/* Host vector support. */ + +#define IMPLVEC TCG_OPF_VECTOR | IMPL(TCG_TARGET_MAYBE_vec) + +DEF(mov_vec, 1, 1, 0, TCG_OPF_VECTOR | TCG_OPF_NOT_PRESENT) +DEF(movi_vec, 1, 0, 0, TCG_OPF_VECTOR | TCG_OPF_NOT_PRESENT) /* vecl defines const args */ +DEF(dupi_vec, 1, 0, 1, TCG_OPF_VECTOR | TCG_OPF_NOT_PRESENT) + +DEF(dup_vec, 1, 1, 0, IMPLVEC) +DEF(dup2_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_REG_BITS == 32)) + +DEF(ld_vec, 1, 1, 1, IMPLVEC) +DEF(st_vec, 0, 2, 1, IMPLVEC) + +DEF(add_vec, 1, 2, 0, IMPLVEC) +DEF(sub_vec, 1, 2, 0, IMPLVEC) +DEF(neg_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_neg_vec)) + +DEF(and_vec, 1, 2, 0, IMPLVEC) +DEF(or_vec, 1, 2, 0, IMPLVEC) +DEF(xor_vec, 1, 2, 0, IMPLVEC) +DEF(andc_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_andc_vec)) +DEF(orc_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_orc_vec)) +DEF(not_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_not_vec)) + #undef TLADDR_ARGS #undef DATA64_ARGS #undef IMPL #undef IMPL64 +#undef IMPLVEC #undef DEF diff --git a/tcg/tcg.h b/tcg/tcg.h index 2ce497cebf..dce483b0ee 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -170,6 +170,27 @@ typedef uint64_t TCGRegSet; # error "Missing unsigned widening multiply" #endif +#if !defined(TCG_TARGET_HAS_v64) \ + && !defined(TCG_TARGET_HAS_v128) \ + && !defined(TCG_TARGET_HAS_v256) +#define TCG_TARGET_MAYBE_vec 0 +#define TCG_TARGET_HAS_neg_vec 0 +#define TCG_TARGET_HAS_not_vec 0 +#define TCG_TARGET_HAS_andc_vec 0 +#define TCG_TARGET_HAS_orc_vec 0 +#else +#define TCG_TARGET_MAYBE_vec 1 +#endif +#ifndef TCG_TARGET_HAS_v64 +#define TCG_TARGET_HAS_v64 0 +#endif +#ifndef TCG_TARGET_HAS_v128 +#define TCG_TARGET_HAS_v128 0 +#endif +#ifndef TCG_TARGET_HAS_v256 +#define TCG_TARGET_HAS_v256 0 +#endif + #ifndef TARGET_INSN_START_EXTRA_WORDS # define TARGET_INSN_START_WORDS 1 #else @@ -246,6 +267,11 @@ typedef struct TCGPool { typedef enum TCGType { TCG_TYPE_I32, TCG_TYPE_I64, + + TCG_TYPE_V64, + TCG_TYPE_V128, + TCG_TYPE_V256, + TCG_TYPE_COUNT, /* number of different types */ /* An alias for the size of the host register. */ @@ -396,6 +422,8 @@ typedef tcg_target_ulong TCGArg; * TCGv_i32 : 32 bit integer type * TCGv_i64 : 64 bit integer type * TCGv_ptr : a host pointer type + * TCGv_vec : a host vector type; the exact size is not exposed + to the CPU front-end code. * TCGv : an integer type the same size as target_ulong (an alias for either TCGv_i32 or TCGv_i64) The compiler's type checking will complain if you mix them @@ -418,6 +446,7 @@ typedef tcg_target_ulong TCGArg; typedef struct TCGv_i32_d *TCGv_i32; typedef struct TCGv_i64_d *TCGv_i64; typedef struct TCGv_ptr_d *TCGv_ptr; +typedef struct TCGv_vec_d *TCGv_vec; typedef TCGv_ptr TCGv_env; #if TARGET_LONG_BITS == 32 #define TCGv TCGv_i32 @@ -589,6 +618,9 @@ typedef struct TCGOp { #define TCGOP_CALLI(X) (X)->param1 #define TCGOP_CALLO(X) (X)->param2 +#define TCGOP_VECL(X) (X)->param1 +#define TCGOP_VECE(X) (X)->param2 + /* Make sure operands fit in the bitfields above. */ QEMU_BUILD_BUG_ON(NB_OPS > (1 << 8)); @@ -726,6 +758,11 @@ static inline TCGTemp *tcgv_ptr_temp(TCGv_ptr v) return tcgv_i32_temp((TCGv_i32)v); } +static inline TCGTemp *tcgv_vec_temp(TCGv_vec v) +{ + return tcgv_i32_temp((TCGv_i32)v); +} + static inline TCGArg tcgv_i32_arg(TCGv_i32 v) { return temp_arg(tcgv_i32_temp(v)); @@ -741,6 +778,11 @@ static inline TCGArg tcgv_ptr_arg(TCGv_ptr v) return temp_arg(tcgv_ptr_temp(v)); } +static inline TCGArg tcgv_vec_arg(TCGv_vec v) +{ + return temp_arg(tcgv_vec_temp(v)); +} + static inline TCGv_i32 temp_tcgv_i32(TCGTemp *t) { (void)temp_idx(t); /* trigger embedded assert */ @@ -757,6 +799,11 @@ static inline TCGv_ptr temp_tcgv_ptr(TCGTemp *t) return (TCGv_ptr)temp_tcgv_i32(t); } +static inline TCGv_vec temp_tcgv_vec(TCGTemp *t) +{ + return (TCGv_vec)temp_tcgv_i32(t); +} + #if TCG_TARGET_REG_BITS == 32 static inline TCGv_i32 TCGV_LOW(TCGv_i64 t) { @@ -832,9 +879,12 @@ TCGTemp *tcg_global_mem_new_internal(TCGType, TCGv_ptr, TCGv_i32 tcg_temp_new_internal_i32(int temp_local); TCGv_i64 tcg_temp_new_internal_i64(int temp_local); +TCGv_vec tcg_temp_new_vec(TCGType type); +TCGv_vec tcg_temp_new_vec_matching(TCGv_vec match); void tcg_temp_free_i32(TCGv_i32 arg); void tcg_temp_free_i64(TCGv_i64 arg); +void tcg_temp_free_vec(TCGv_vec arg); static inline TCGv_i32 tcg_global_mem_new_i32(TCGv_ptr reg, intptr_t offset, const char *name) @@ -916,6 +966,8 @@ enum { /* Instruction is optional and not implemented by the host, or insn is generic and should not be implemened by the host. */ TCG_OPF_NOT_PRESENT = 0x10, + /* Instruction operands are vectors. */ + TCG_OPF_VECTOR = 0x20, }; typedef struct TCGOpDef { @@ -981,6 +1033,10 @@ TCGv_i32 tcg_const_i32(int32_t val); TCGv_i64 tcg_const_i64(int64_t val); TCGv_i32 tcg_const_local_i32(int32_t val); TCGv_i64 tcg_const_local_i64(int64_t val); +TCGv_vec tcg_const_zeros_vec(TCGType); +TCGv_vec tcg_const_ones_vec(TCGType); +TCGv_vec tcg_const_zeros_vec_matching(TCGv_vec); +TCGv_vec tcg_const_ones_vec_matching(TCGv_vec); TCGLabel *gen_new_label(void); diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c new file mode 100644 index 0000000000..dc04c11860 --- /dev/null +++ b/tcg/tcg-op-vec.c @@ -0,0 +1,362 @@ +/* + * Tiny Code Generator for QEMU + * + * Copyright (c) 2008 Fabrice Bellard + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "qemu/osdep.h" +#include "qemu-common.h" +#include "cpu.h" +#include "exec/exec-all.h" +#include "tcg.h" +#include "tcg-op.h" +#include "tcg-mo.h" + +/* Reduce the number of ifdefs below. This assumes that all uses of + TCGV_HIGH and TCGV_LOW are properly protected by a conditional that + the compiler can eliminate. */ +#if TCG_TARGET_REG_BITS == 64 +extern TCGv_i32 TCGV_LOW_link_error(TCGv_i64); +extern TCGv_i32 TCGV_HIGH_link_error(TCGv_i64); +#define TCGV_LOW TCGV_LOW_link_error +#define TCGV_HIGH TCGV_HIGH_link_error +#endif + +void vec_gen_2(TCGOpcode opc, TCGType type, unsigned vece, TCGArg r, TCGArg a) +{ + TCGOp *op = tcg_emit_op(opc); + TCGOP_VECL(op) = type - TCG_TYPE_V64; + TCGOP_VECE(op) = vece; + op->args[0] = r; + op->args[1] = a; +} + +void vec_gen_3(TCGOpcode opc, TCGType type, unsigned vece, + TCGArg r, TCGArg a, TCGArg b) +{ + TCGOp *op = tcg_emit_op(opc); + TCGOP_VECL(op) = type - TCG_TYPE_V64; + TCGOP_VECE(op) = vece; + op->args[0] = r; + op->args[1] = a; + op->args[2] = b; +} + +void vec_gen_4(TCGOpcode opc, TCGType type, unsigned vece, + TCGArg r, TCGArg a, TCGArg b, TCGArg c) +{ + TCGOp *op = tcg_emit_op(opc); + TCGOP_VECL(op) = type - TCG_TYPE_V64; + TCGOP_VECE(op) = vece; + op->args[0] = r; + op->args[1] = a; + op->args[2] = b; + op->args[3] = c; +} + +static void vec_gen_op2(TCGOpcode opc, unsigned vece, TCGv_vec r, TCGv_vec a) +{ + TCGTemp *rt = tcgv_vec_temp(r); + TCGTemp *at = tcgv_vec_temp(a); + TCGType type = rt->base_type; + + tcg_debug_assert(at->base_type == type); + vec_gen_2(opc, type, vece, temp_arg(rt), temp_arg(at)); +} + +static void vec_gen_op3(TCGOpcode opc, unsigned vece, + TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + TCGTemp *rt = tcgv_vec_temp(r); + TCGTemp *at = tcgv_vec_temp(a); + TCGTemp *bt = tcgv_vec_temp(b); + TCGType type = rt->base_type; + + tcg_debug_assert(at->base_type == type); + tcg_debug_assert(bt->base_type == type); + vec_gen_3(opc, type, vece, temp_arg(rt), temp_arg(at), temp_arg(bt)); +} + +void tcg_gen_mov_vec(TCGv_vec r, TCGv_vec a) +{ + if (r != a) { + vec_gen_op2(INDEX_op_mov_vec, 0, r, a); + } +} + +#define MO_REG (TCG_TARGET_REG_BITS == 64 ? MO_64 : MO_32) + +static void tcg_gen_dupi_vec(TCGv_vec r, unsigned vece, TCGArg a) +{ + TCGTemp *rt = tcgv_vec_temp(r); + vec_gen_2(INDEX_op_dupi_vec, rt->base_type, vece, temp_arg(rt), a); +} + +TCGv_vec tcg_const_zeros_vec(TCGType type) +{ + TCGv_vec ret = tcg_temp_new_vec(type); + tcg_gen_dupi_vec(ret, MO_REG, 0); + return ret; +} + +TCGv_vec tcg_const_ones_vec(TCGType type) +{ + TCGv_vec ret = tcg_temp_new_vec(type); + tcg_gen_dupi_vec(ret, MO_REG, -1); + return ret; +} + +TCGv_vec tcg_const_zeros_vec_matching(TCGv_vec m) +{ + TCGTemp *t = tcgv_vec_temp(m); + return tcg_const_zeros_vec(t->base_type); +} + +TCGv_vec tcg_const_ones_vec_matching(TCGv_vec m) +{ + TCGTemp *t = tcgv_vec_temp(m); + return tcg_const_ones_vec(t->base_type); +} + +void tcg_gen_dup64i_vec(TCGv_vec r, uint64_t a) +{ + if (TCG_TARGET_REG_BITS == 32 && a == deposit64(a, 32, 32, a)) { + tcg_gen_dupi_vec(r, MO_32, a); + } else if (TCG_TARGET_REG_BITS == 64 || a == (uint64_t)(int32_t)a) { + tcg_gen_dupi_vec(r, MO_64, a); + } else { + TCGv_i64 c = tcg_const_i64(a); + tcg_gen_dup_i64_vec(MO_64, r, c); + tcg_temp_free_i64(c); + } +} + +void tcg_gen_dup32i_vec(TCGv_vec r, uint32_t a) +{ + tcg_gen_dupi_vec(r, MO_REG, ((TCGArg)-1 / 0xffffffffu) * a); +} + +void tcg_gen_dup16i_vec(TCGv_vec r, uint32_t a) +{ + tcg_gen_dupi_vec(r, MO_REG, ((TCGArg)-1 / 0xffff) * (a & 0xffff)); +} + +void tcg_gen_dup8i_vec(TCGv_vec r, uint32_t a) +{ + tcg_gen_dupi_vec(r, MO_REG, ((TCGArg)-1 / 0xff) * (a & 0xff)); +} + +void tcg_gen_movi_v64(TCGv_vec r, uint64_t a) +{ + TCGTemp *rt = tcgv_vec_temp(r); + TCGArg ri = temp_arg(rt); + + tcg_debug_assert(rt->base_type == TCG_TYPE_V64); + if (TCG_TARGET_REG_BITS == 64) { + vec_gen_2(INDEX_op_movi_vec, TCG_TYPE_V64, 0, ri, a); + } else { + vec_gen_3(INDEX_op_movi_vec, TCG_TYPE_V64, 0, ri, a, a >> 32); + } +} + +void tcg_gen_movi_v128(TCGv_vec r, uint64_t a, uint64_t b) +{ + TCGTemp *rt = tcgv_vec_temp(r); + TCGArg ri = temp_arg(rt); + + tcg_debug_assert(rt->base_type == TCG_TYPE_V128); + if (a == b) { + tcg_gen_dup64i_vec(r, a); + } else if (TCG_TARGET_REG_BITS == 64) { + vec_gen_3(INDEX_op_movi_vec, TCG_TYPE_V128, 0, ri, a, b); + } else { + TCGOp *op = tcg_emit_op(INDEX_op_movi_vec); + TCGOP_VECL(op) = TCG_TYPE_V128 - TCG_TYPE_V64; + op->args[0] = ri; + op->args[1] = a; + op->args[2] = a >> 32; + op->args[3] = b; + op->args[4] = b >> 32; + } +} + +void tcg_gen_movi_v256(TCGv_vec r, uint64_t a, uint64_t b, + uint64_t c, uint64_t d) +{ + TCGArg ri = tcgv_vec_arg(r); + TCGTemp *rt = arg_temp(ri); + + tcg_debug_assert(rt->base_type == TCG_TYPE_V256); + if (a == b && a == c && a == d) { + tcg_gen_dup64i_vec(r, a); + } else { + TCGOp *op = tcg_emit_op(INDEX_op_movi_vec); + TCGOP_VECL(op) = TCG_TYPE_V256 - TCG_TYPE_V64; + op->args[0] = ri; + if (TCG_TARGET_REG_BITS == 64) { + op->args[1] = a; + op->args[2] = b; + op->args[3] = c; + op->args[4] = d; + } else { + op->args[1] = a; + op->args[2] = a >> 32; + op->args[3] = b; + op->args[4] = b >> 32; + op->args[5] = c; + op->args[6] = c >> 32; + op->args[7] = d; + op->args[8] = d >> 32; + } + } +} + +void tcg_gen_dup_i64_vec(unsigned vece, TCGv_vec r, TCGv_i64 a) +{ + TCGArg ri = tcgv_vec_arg(r); + TCGTemp *rt = arg_temp(ri); + TCGType type = rt->base_type; + + if (TCG_TARGET_REG_BITS == 64) { + TCGArg ai = tcgv_i64_arg(a); + vec_gen_2(INDEX_op_dup_vec, type, MO_64, ri, ai); + } else if (vece == MO_64) { + TCGArg al = tcgv_i32_arg(TCGV_LOW(a)); + TCGArg ah = tcgv_i32_arg(TCGV_HIGH(a)); + vec_gen_3(INDEX_op_dup2_vec, type, MO_64, ri, al, ah); + } else { + TCGArg ai = tcgv_i32_arg(TCGV_LOW(a)); + vec_gen_2(INDEX_op_dup_vec, type, MO_64, ri, ai); + } +} + +void tcg_gen_dup_i32_vec(unsigned vece, TCGv_vec r, TCGv_i32 a) +{ + TCGArg ri = tcgv_vec_arg(r); + TCGArg ai = tcgv_i32_arg(a); + TCGTemp *rt = arg_temp(ri); + TCGType type = rt->base_type; + + vec_gen_2(INDEX_op_dup_vec, type, vece, ri, ai); +} + +static void vec_gen_ldst(TCGOpcode opc, TCGv_vec r, TCGv_ptr b, TCGArg o) +{ + TCGArg ri = tcgv_vec_arg(r); + TCGArg bi = tcgv_ptr_arg(b); + TCGTemp *rt = arg_temp(ri); + TCGType type = rt->base_type; + + vec_gen_3(opc, type, 0, ri, bi, o); +} + +void tcg_gen_ld_vec(TCGv_vec r, TCGv_ptr b, TCGArg o) +{ + vec_gen_ldst(INDEX_op_ld_vec, r, b, o); +} + +void tcg_gen_st_vec(TCGv_vec r, TCGv_ptr b, TCGArg o) +{ + vec_gen_ldst(INDEX_op_st_vec, r, b, o); +} + +void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr b, TCGArg o, TCGType low_type) +{ + TCGArg ri = tcgv_vec_arg(r); + TCGArg bi = tcgv_ptr_arg(b); + TCGTemp *rt = arg_temp(ri); + TCGType type = rt->base_type; + + tcg_debug_assert(low_type >= TCG_TYPE_V64); + tcg_debug_assert(low_type <= type); + vec_gen_3(INDEX_op_st_vec, low_type, 0, ri, bi, o); +} + +void tcg_gen_add_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + vec_gen_op3(INDEX_op_add_vec, vece, r, a, b); +} + +void tcg_gen_sub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + vec_gen_op3(INDEX_op_sub_vec, vece, r, a, b); +} + +void tcg_gen_and_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + vec_gen_op3(INDEX_op_and_vec, 0, r, a, b); +} + +void tcg_gen_or_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + vec_gen_op3(INDEX_op_or_vec, 0, r, a, b); +} + +void tcg_gen_xor_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + vec_gen_op3(INDEX_op_xor_vec, 0, r, a, b); +} + +void tcg_gen_andc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + if (TCG_TARGET_HAS_andc_vec) { + vec_gen_op3(INDEX_op_andc_vec, 0, r, a, b); + } else { + TCGv_vec t = tcg_temp_new_vec_matching(r); + tcg_gen_not_vec(0, t, b); + tcg_gen_and_vec(0, r, a, t); + tcg_temp_free_vec(t); + } +} + +void tcg_gen_orc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + if (TCG_TARGET_HAS_orc_vec) { + vec_gen_op3(INDEX_op_orc_vec, 0, r, a, b); + } else { + TCGv_vec t = tcg_temp_new_vec_matching(r); + tcg_gen_not_vec(0, t, b); + tcg_gen_or_vec(0, r, a, t); + tcg_temp_free_vec(t); + } +} + +void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a) +{ + if (TCG_TARGET_HAS_not_vec) { + vec_gen_op2(INDEX_op_not_vec, 0, r, a); + } else { + TCGv_vec t = tcg_const_ones_vec_matching(r); + tcg_gen_xor_vec(0, r, a, t); + tcg_temp_free_vec(t); + } +} + +void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a) +{ + if (TCG_TARGET_HAS_neg_vec) { + vec_gen_op2(INDEX_op_neg_vec, vece, r, a); + } else { + TCGv_vec t = tcg_const_zeros_vec_matching(r); + tcg_gen_sub_vec(vece, r, t, a); + tcg_temp_free_vec(t); + } +} diff --git a/tcg/tcg.c b/tcg/tcg.c index 93caa0be93..16b8faf66f 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -106,6 +106,18 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret, tcg_target_long arg); static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args, const int *const_args); +#if TCG_TARGET_MAYBE_vec +static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, unsigned vecl, + unsigned vece, const TCGArg *args, + const int *const_args); +#else +static inline void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, unsigned vecl, + unsigned vece, const TCGArg *args, + const int *const_args) +{ + g_assert_not_reached(); +} +#endif static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, TCGReg arg1, intptr_t arg2); static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val, @@ -146,8 +158,7 @@ struct tcg_region_state { }; static struct tcg_region_state region; - -static TCGRegSet tcg_target_available_regs[2]; +static TCGRegSet tcg_target_available_regs[TCG_TYPE_COUNT]; static TCGRegSet tcg_target_call_clobber_regs; #if TCG_TARGET_INSN_UNIT_SIZE == 1 @@ -1026,6 +1037,41 @@ TCGv_i64 tcg_temp_new_internal_i64(int temp_local) return temp_tcgv_i64(t); } +TCGv_vec tcg_temp_new_vec(TCGType type) +{ + TCGTemp *t; + +#ifdef CONFIG_DEBUG_TCG + switch (type) { + case TCG_TYPE_V64: + assert(TCG_TARGET_HAS_v64); + break; + case TCG_TYPE_V128: + assert(TCG_TARGET_HAS_v128); + break; + case TCG_TYPE_V256: + assert(TCG_TARGET_HAS_v256); + break; + default: + g_assert_not_reached(); + } +#endif + + t = tcg_temp_new_internal(type, 0); + return temp_tcgv_vec(t); +} + +/* Create a new temp of the same type as an existing temp. */ +TCGv_vec tcg_temp_new_vec_matching(TCGv_vec match) +{ + TCGTemp *t = tcgv_vec_temp(match); + + tcg_debug_assert(t->temp_allocated != 0); + + t = tcg_temp_new_internal(t->base_type, 0); + return temp_tcgv_vec(t); +} + static void tcg_temp_free_internal(TCGTemp *ts) { TCGContext *s = tcg_ctx; @@ -1057,6 +1103,11 @@ void tcg_temp_free_i64(TCGv_i64 arg) tcg_temp_free_internal(tcgv_i64_temp(arg)); } +void tcg_temp_free_vec(TCGv_vec arg) +{ + tcg_temp_free_internal(tcgv_vec_temp(arg)); +} + TCGv_i32 tcg_const_i32(int32_t val) { TCGv_i32 t0; @@ -1114,6 +1165,9 @@ int tcg_check_temp_count(void) Test the runtime variable that controls each opcode. */ bool tcg_op_supported(TCGOpcode op) { + const bool have_vec + = TCG_TARGET_HAS_v64 | TCG_TARGET_HAS_v128 | TCG_TARGET_HAS_v256; + switch (op) { case INDEX_op_discard: case INDEX_op_set_label: @@ -1327,6 +1381,29 @@ bool tcg_op_supported(TCGOpcode op) case INDEX_op_mulsh_i64: return TCG_TARGET_HAS_mulsh_i64; + case INDEX_op_mov_vec: + case INDEX_op_movi_vec: + case INDEX_op_dup_vec: + case INDEX_op_dupi_vec: + case INDEX_op_ld_vec: + case INDEX_op_st_vec: + case INDEX_op_add_vec: + case INDEX_op_sub_vec: + case INDEX_op_and_vec: + case INDEX_op_or_vec: + case INDEX_op_xor_vec: + return have_vec; + case INDEX_op_dup2_vec: + return have_vec && TCG_TARGET_REG_BITS == 32; + case INDEX_op_not_vec: + return have_vec && TCG_TARGET_HAS_not_vec; + case INDEX_op_neg_vec: + return have_vec && TCG_TARGET_HAS_neg_vec; + case INDEX_op_andc_vec: + return have_vec && TCG_TARGET_HAS_andc_vec; + case INDEX_op_orc_vec: + return have_vec && TCG_TARGET_HAS_orc_vec; + case NB_OPS: break; } @@ -1661,6 +1738,14 @@ void tcg_dump_ops(TCGContext *s) nb_iargs = def->nb_iargs; nb_cargs = def->nb_cargs; + if (c == INDEX_op_movi_vec) { + nb_cargs = (64 / TCG_TARGET_REG_BITS) << TCGOP_VECL(op); + } + if (def->flags & TCG_OPF_VECTOR) { + col += qemu_log("v%d,e%d,", 64 << TCGOP_VECL(op), + 8 << TCGOP_VECE(op)); + } + k = 0; for (i = 0; i < nb_oargs; i++) { if (k != 0) { @@ -2890,8 +2975,13 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op) } /* emit instruction */ - tcg_out_op(s, op->opc, new_args, const_args); - + if (def->flags & TCG_OPF_VECTOR) { + tcg_out_vec_op(s, op->opc, TCGOP_VECL(op), TCGOP_VECE(op), + new_args, const_args); + } else { + tcg_out_op(s, op->opc, new_args, const_args); + } + /* move the outputs in the correct register if needed */ for(i = 0; i < nb_oargs; i++) { ts = arg_temp(op->args[i]); @@ -3239,10 +3329,12 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb) switch (opc) { case INDEX_op_mov_i32: case INDEX_op_mov_i64: + case INDEX_op_mov_vec: tcg_reg_alloc_mov(s, op); break; case INDEX_op_movi_i32: case INDEX_op_movi_i64: + case INDEX_op_dupi_vec: tcg_reg_alloc_movi(s, op); break; case INDEX_op_insn_start: diff --git a/tcg/README b/tcg/README index 03bfb6acd4..e14990fb9b 100644 --- a/tcg/README +++ b/tcg/README @@ -503,6 +503,64 @@ of the memory access. For a 32-bit host, qemu_ld/st_i64 is guaranteed to only be used with a 64-bit memory access specified in flags. +********* Host vector operations + +All of the vector ops have two parameters, TCGOP_VECL & TCGOP_VECE. +The former specifies the length of the vector in log2 64-bit units; the +later specifies the length of the element (if applicable) in log2 8-bit units. +E.g. VECL=1 -> 64 << 1 -> v128, and VECE=2 -> 1 << 2 -> i32. + +* mov_vec v0, v1 +* ld_vec v0, t1 +* st_vec v0, t1 + + Move, load and store. + +* movi_vec v0, a, b, ... + + Move with constant data. There are arguments to hold the entire + vector value, stored little-endian. Note that the way MAX_OPC_PARAM + is sized for 64- and 32-bit hosts, there are enough slots for v256, + but not a future v512. + + Prefer dupi_vec when possible. + +* dup_vec v0, r1 + + Duplicate the low N bits of R1 into VECL/VECE copies across V0. + +* dupi_vec v0, c + + Similarly, for a constant. + Smaller values will be replicated to host register size by the expanders. + +* dup2_vec v0, r1, r2 + + Duplicate r2:r1 into VECL/64 copies across V0. This opcode is + only present for 32-bit hosts. + +* add_vec v0, v1, v2 + + v0 = v1 + v2, in elements across the vector. + +* sub_vec v0, v1, v2 + + Similarly, v0 = v1 - v2. + +* neg_vec v0, v1 + + Similarly, v0 = -v1. + +* and_vec v0, v1, v2 +* or_vec v0, v1, v2 +* xor_vec v0, v1, v2 +* andc_vec v0, v1, v2 +* orc_vec v0, v1, v2 +* not_vec v0, v1 + + Similarly, logical operations with and without compliment. + Note that VECE is unused. + ********* Note 1: Some shortcuts are defined when the last operand is known to be From patchwork Mon Dec 18 17:17:34 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122255 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3112364qgn; Mon, 18 Dec 2017 09:31:15 -0800 (PST) X-Google-Smtp-Source: ACJfBotM41882HKNNbrQ4msAd1YN6fL+Y75MbLj38+UkLu2C9gLGDR5Q2OPSe8ekaMrQ/cHziWRF X-Received: by 10.37.87.8 with SMTP id l8mr425181ybb.199.1513618275000; Mon, 18 Dec 2017 09:31:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513618274; cv=none; d=google.com; s=arc-20160816; b=GiZfzNynf1jfidW/x3jLBXnhHAwA2Zy8cE5G/gROoy2yaJLr3/hMXm70yaJ/bWwffz pMr7QMEWhJf+FPDdzFZqk6mWEC68zFwAvrgFj4S8Yh6e+UiKN8BGPpAVOQBdZiap2kqL wO+jWmOBkb4PLSCW6VWiFgUuhjSh4u0IMCNdICEp2JzJNk8S5kXmvRatRvtMqb5FnmNU O+m4V7wjWlpwwU24v7hFWd+itfrBp12RLmOG7CRbpBmcKPYLbC4pgYPqLRWzYJKUHsuL F5SIXvxGE/NUymBDWBC+FTaClQkQoaEOTQxPQCAL2rznCivhIsTWXR5hGk+Q8XEr0VbX XBTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=vTRkGgb8ANdYsJ9jsEEHbGrrwAUDEtReEQXNXHvjjGc=; b=O91l7jB0ss8k4W9i9O61BpoUbPJndn4N8LdTQTJ3A9csLxIgvjm6TMkDHRWmglK5T+ /UQXRy+/5Kqvy4my5xH7r9ydkk+nChqpFtKmdnc2lzH5cRVIE6b+hpX7g48H/9adEvY8 W+8g+VwG0yP5jBvL7DXVpdwLg7undwcf78VsLufhB2hIvRtLNsOBaAKrEx8MBtNUmHYC ZMW7XNBWu08gW1lj4CBCPJ7M95yqyVHXLUNFjJTdu3b80rv4dCTAYR36sjJnWCYTkl0i 1tza7MWn7BuyUyvgRWYgthWoW/gZcRbWOenX0UM6cxp7wlDISlDc3uuecUp6BtEYTQkG TuiQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=ClQ7JN4w; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id n4si2622352ybn.570.2017.12.18.09.31.14 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:31:14 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=ClQ7JN4w; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58595 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzFy-0001l7-9m for patch@linaro.org; Mon, 18 Dec 2017 12:31:14 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37052) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3W-0000PX-Lo for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:35 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3G-0000ll-Qm for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:22 -0500 Received: from mail-pg0-x235.google.com ([2607:f8b0:400e:c05::235]:39585) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3G-0000kH-DR for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:06 -0500 Received: by mail-pg0-x235.google.com with SMTP id w7so9403770pgv.6 for ; Mon, 18 Dec 2017 09:18:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=vTRkGgb8ANdYsJ9jsEEHbGrrwAUDEtReEQXNXHvjjGc=; b=ClQ7JN4wmckl8mGahNDcwOB9j19JvGBw3AGsl6IyqWbA1OjXmeP7zS0KMAvvpEaAPe Qg25SMw8YboRXDNz93SzkRopVNKW9KVY+RXbL1iFxdZsbmJbRx8kHro/LTdYh7fhXZmS 0iexbfZQENnkm/AoraKm+Qu+cc9UHWxIjl5B8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=vTRkGgb8ANdYsJ9jsEEHbGrrwAUDEtReEQXNXHvjjGc=; b=J7Oad+jnuZ3Ad47jdYdhBMHNtj4AiJ8EAyd0WGyX2b+FGSQEptAf6iI0/ZsquMeqHu Ej3lod+rVqLRAxRynhjJWOTvVf3BPWCx8RgGaeF62O1iOYQRSzECAU9NMd7xxwnAteLV q6po9V5FZktZ3zWh1Gmj4wdk28/fbeeUAPWwTAql/0Jst3Swv4mEh8G4awSO3ylrePxn Rj2RBWj7uvSDNfy89hKSDqj9zPywrNwbDzAO4/KKNwzj0E/dhoqZl/yGZrg+qVAwKRiG QNOiwZdEON9vp3q5vEKAoZTtu8naKjBeULS+9y7TWQRrzXt0WG/ylVcdSXP+pD+7O4UQ nvEw== X-Gm-Message-State: AKGB3mJWq5HdqENnMRE8kcpzAO0sKysAZslXeRdIzQJYknC/9i3qgFd7 3NazXZWug+OozS8vK3HkYfKSO8YDKb0= X-Received: by 10.99.97.66 with SMTP id v63mr371089pgb.184.1513617484230; Mon, 18 Dec 2017 09:18:04 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.02 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:03 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:34 -0800 Message-Id: <20171218171758.16964-3-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::235 Subject: [Qemu-devel] [PATCH v7 02/26] tcg: Add generic vector expanders X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- Makefile.target | 2 +- accel/tcg/tcg-runtime.h | 29 ++ tcg/tcg-gvec-desc.h | 49 ++ tcg/tcg-op-gvec.h | 152 ++++++ tcg/tcg-op.h | 1 + accel/tcg/tcg-runtime-gvec.c | 295 ++++++++++++ tcg/tcg-op-gvec.c | 1099 ++++++++++++++++++++++++++++++++++++++++++ tcg/tcg-op-vec.c | 36 +- accel/tcg/Makefile.objs | 2 +- 9 files changed, 1655 insertions(+), 10 deletions(-) create mode 100644 tcg/tcg-gvec-desc.h create mode 100644 tcg/tcg-op-gvec.h create mode 100644 accel/tcg/tcg-runtime-gvec.c create mode 100644 tcg/tcg-op-gvec.c -- 2.14.3 diff --git a/Makefile.target b/Makefile.target index 7f30a1e725..6549481096 100644 --- a/Makefile.target +++ b/Makefile.target @@ -93,7 +93,7 @@ all: $(PROGS) stap # cpu emulator library obj-y += exec.o obj-y += accel/ -obj-$(CONFIG_TCG) += tcg/tcg.o tcg/tcg-op.o tcg/tcg-op-vec.o +obj-$(CONFIG_TCG) += tcg/tcg.o tcg/tcg-op.o tcg/tcg-op-vec.o tcg/tcg-op-gvec.o obj-$(CONFIG_TCG) += tcg/tcg-common.o tcg/optimize.o obj-$(CONFIG_TCG_INTERPRETER) += tcg/tci.o obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h index 1df17d0ba9..76ee41ce58 100644 --- a/accel/tcg/tcg-runtime.h +++ b/accel/tcg/tcg-runtime.h @@ -134,3 +134,32 @@ GEN_ATOMIC_HELPERS(xor_fetch) GEN_ATOMIC_HELPERS(xchg) #undef GEN_ATOMIC_HELPERS + +DEF_HELPER_FLAGS_3(gvec_mov, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(gvec_dup8, TCG_CALL_NO_RWG, void, ptr, i32, i32) +DEF_HELPER_FLAGS_3(gvec_dup16, TCG_CALL_NO_RWG, void, ptr, i32, i32) +DEF_HELPER_FLAGS_3(gvec_dup32, TCG_CALL_NO_RWG, void, ptr, i32, i32) +DEF_HELPER_FLAGS_3(gvec_dup64, TCG_CALL_NO_RWG, void, ptr, i32, i64) + +DEF_HELPER_FLAGS_4(gvec_add8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_add16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_add32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_add64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_sub8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sub16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sub32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sub64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(gvec_neg8, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_neg16, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_neg32, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_neg64, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(gvec_not, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_and, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_or, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_xor, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_andc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_orc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/tcg/tcg-gvec-desc.h b/tcg/tcg-gvec-desc.h new file mode 100644 index 0000000000..8ba9a8168d --- /dev/null +++ b/tcg/tcg-gvec-desc.h @@ -0,0 +1,49 @@ +/* + * Generic vector operation descriptor + * + * Copyright (c) 2017 Linaro + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see . + */ + +/* ??? These bit widths are set for ARM SVE, maxing out at 256 byte vectors. */ +#define SIMD_OPRSZ_SHIFT 0 +#define SIMD_OPRSZ_BITS 5 + +#define SIMD_MAXSZ_SHIFT (SIMD_OPRSZ_SHIFT + SIMD_OPRSZ_BITS) +#define SIMD_MAXSZ_BITS 5 + +#define SIMD_DATA_SHIFT (SIMD_MAXSZ_SHIFT + SIMD_MAXSZ_BITS) +#define SIMD_DATA_BITS (32 - SIMD_DATA_SHIFT) + +/* Create a descriptor from components. */ +uint32_t simd_desc(uint32_t oprsz, uint32_t maxsz, int32_t data); + +/* Extract the operation size from a descriptor. */ +static inline intptr_t simd_oprsz(uint32_t desc) +{ + return (extract32(desc, SIMD_OPRSZ_SHIFT, SIMD_OPRSZ_BITS) + 1) * 8; +} + +/* Extract the max vector size from a descriptor. */ +static inline intptr_t simd_maxsz(uint32_t desc) +{ + return (extract32(desc, SIMD_MAXSZ_SHIFT, SIMD_MAXSZ_BITS) + 1) * 8; +} + +/* Extract the operation-specific data from a descriptor. */ +static inline int32_t simd_data(uint32_t desc) +{ + return sextract32(desc, SIMD_DATA_SHIFT, SIMD_DATA_BITS); +} diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h new file mode 100644 index 0000000000..95739946ff --- /dev/null +++ b/tcg/tcg-op-gvec.h @@ -0,0 +1,152 @@ +/* + * Generic vector operation expansion + * + * Copyright (c) 2017 Linaro + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see . + */ + +/* + * "Generic" vectors. All operands are given as offsets from ENV, + * and therefore cannot also be allocated via tcg_global_mem_new_*. + * OPRSZ is the byte size of the vector upon which the operation is performed. + * MAXSZ is the byte size of the full vector; bytes beyond OPSZ are cleared. + * + * All sizes must be 8 or any multiple of 16. + * When OPRSZ is 8, the alignment may be 8, otherwise must be 16. + * Operands may completely, but not partially, overlap. + */ + +/* Expand a call to a gvec-style helper, with pointers to two vector + operands, and a descriptor (see tcg-gvec-desc.h). */ +typedef void (gen_helper_gvec_2)(TCGv_ptr, TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_2_ool(uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, int32_t data, + gen_helper_gvec_2 *fn); + +/* Similarly, passing an extra pointer (e.g. env or float_status). */ +typedef void (gen_helper_gvec_2_ptr)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_2_ptr(uint32_t dofs, uint32_t aofs, + TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz, + int32_t data, gen_helper_gvec_2_ptr *fn); + +/* Similarly, with three vector operands. */ +typedef void (gen_helper_gvec_3)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_3_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t oprsz, uint32_t maxsz, int32_t data, + gen_helper_gvec_3 *fn); + +typedef void (gen_helper_gvec_3_ptr)(TCGv_ptr, TCGv_ptr, TCGv_ptr, + TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_3_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs, + TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz, + int32_t data, gen_helper_gvec_3_ptr *fn); + +/* Expand a gvec operation. Either inline or out-of-line depending on + the actual vector size and the operations supported by the host. */ +typedef struct { + /* Expand inline as a 64-bit or 32-bit integer. + Only one of these will be non-NULL. */ + void (*fni8)(TCGv_i64, TCGv_i64); + void (*fni4)(TCGv_i32, TCGv_i32); + /* Expand inline with a host vector type. */ + void (*fniv)(unsigned, TCGv_vec, TCGv_vec); + /* Expand out-of-line helper w/descriptor. */ + gen_helper_gvec_2 *fno; + /* The opcode, if any, to which this corresponds. */ + TCGOpcode opc; + /* The vector element size, if applicable. */ + uint8_t vece; + /* Prefer i64 to v64. */ + bool prefer_i64; +} GVecGen2; + +typedef struct { + /* Expand inline as a 64-bit or 32-bit integer. + Only one of these will be non-NULL. */ + void (*fni8)(TCGv_i64, TCGv_i64, TCGv_i64); + void (*fni4)(TCGv_i32, TCGv_i32, TCGv_i32); + /* Expand inline with a host vector type. */ + void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec); + /* Expand out-of-line helper w/descriptor. */ + gen_helper_gvec_3 *fno; + /* The opcode, if any, to which this corresponds. */ + TCGOpcode opc; + /* The vector element size, if applicable. */ + uint8_t vece; + /* Prefer i64 to v64. */ + bool prefer_i64; + /* Load dest as a 3rd source operand. */ + bool load_dest; +} GVecGen3; + +void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t clsz, const GVecGen2 *); +void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz, const GVecGen3 *); + +/* Expand a specific vector operation. */ + +void tcg_gen_gvec_mov(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_not(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_neg(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t clsz); + +void tcg_gen_gvec_add(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_sub(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); + +void tcg_gen_gvec_and(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_or(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_xor(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_andc(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); + +void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t s, uint32_t m); +void tcg_gen_gvec_dup_i32(unsigned vece, uint32_t dofs, uint32_t s, + uint32_t m, TCGv_i32); +void tcg_gen_gvec_dup_i64(unsigned vece, uint32_t dofs, uint32_t s, + uint32_t m, TCGv_i64); + +void tcg_gen_gvec_dup8i(uint32_t dofs, uint32_t s, uint32_t m, uint8_t x); +void tcg_gen_gvec_dup16i(uint32_t dofs, uint32_t s, uint32_t m, uint16_t x); +void tcg_gen_gvec_dup32i(uint32_t dofs, uint32_t s, uint32_t m, uint32_t x); +void tcg_gen_gvec_dup64i(uint32_t dofs, uint32_t s, uint32_t m, uint64_t x); + +/* + * 64-bit vector operations. Use these when the register has been allocated + * with tcg_global_mem_new_i64, and so we cannot also address it via pointer. + * OPRSZ = MAXSZ = 8. + */ + +void tcg_gen_vec_neg8_i64(TCGv_i64 d, TCGv_i64 a); +void tcg_gen_vec_neg16_i64(TCGv_i64 d, TCGv_i64 a); +void tcg_gen_vec_neg32_i64(TCGv_i64 d, TCGv_i64 a); + +void tcg_gen_vec_add8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); +void tcg_gen_vec_add16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); +void tcg_gen_vec_add32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); + +void tcg_gen_vec_sub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); +void tcg_gen_vec_sub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); +void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index 9b0560e4d3..5f49785cb3 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -914,6 +914,7 @@ void tcg_gen_dup8i_vec(TCGv_vec, uint32_t); void tcg_gen_dup16i_vec(TCGv_vec, uint32_t); void tcg_gen_dup32i_vec(TCGv_vec, uint32_t); void tcg_gen_dup64i_vec(TCGv_vec, uint64_t); +void tcg_gen_dupi_vec(unsigned vece, TCGv_vec, uint64_t); void tcg_gen_movi_v64(TCGv_vec, uint64_t); void tcg_gen_movi_v128(TCGv_vec, uint64_t, uint64_t); void tcg_gen_movi_v256(TCGv_vec, uint64_t, uint64_t, uint64_t, uint64_t); diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c new file mode 100644 index 0000000000..cd1ce12b7e --- /dev/null +++ b/accel/tcg/tcg-runtime-gvec.c @@ -0,0 +1,295 @@ +/* + * Generic vectorized operation runtime + * + * Copyright (c) 2017 Linaro + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see . + */ + +#include "qemu/osdep.h" +#include "qemu/host-utils.h" +#include "cpu.h" +#include "exec/helper-proto.h" +#include "tcg-gvec-desc.h" + + +/* Virtually all hosts support 16-byte vectors. Those that don't can emulate + them via GCC's generic vector extension. This turns out to be simpler and + more reliable than getting the compiler to autovectorize. + + In tcg-op-gvec.c, we asserted that both the size and alignment + of the data are multiples of 16. */ + +typedef uint8_t vec8 __attribute__((vector_size(16))); +typedef uint16_t vec16 __attribute__((vector_size(16))); +typedef uint32_t vec32 __attribute__((vector_size(16))); +typedef uint64_t vec64 __attribute__((vector_size(16))); + +static inline void clear_high(void *d, intptr_t oprsz, uint32_t desc) +{ + intptr_t maxsz = simd_maxsz(desc); + intptr_t i; + + if (unlikely(maxsz > oprsz)) { + for (i = oprsz; i < maxsz; i += sizeof(uint64_t)) { + *(uint64_t *)(d + i) = 0; + } + } +} + +void HELPER(gvec_add8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(vec8 *)(d + i) = *(vec8 *)(a + i) + *(vec8 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_add16)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(vec16 *)(d + i) = *(vec16 *)(a + i) + *(vec16 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_add32)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(vec32 *)(d + i) = *(vec32 *)(a + i) + *(vec32 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_add64)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) + *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sub8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(vec8 *)(d + i) = *(vec8 *)(a + i) - *(vec8 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sub16)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(vec16 *)(d + i) = *(vec16 *)(a + i) - *(vec16 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sub32)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(vec32 *)(d + i) = *(vec32 *)(a + i) - *(vec32 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sub64)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) - *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_neg8)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(vec8 *)(d + i) = -*(vec8 *)(a + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_neg16)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(vec16 *)(d + i) = -*(vec16 *)(a + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_neg32)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(vec32 *)(d + i) = -*(vec32 *)(a + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_neg64)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = -*(vec64 *)(a + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_mov)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + + memcpy(d, a, oprsz); + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_dup64)(void *d, uint32_t desc, uint64_t c) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + if (c == 0) { + oprsz = 0; + } else { + for (i = 0; i < oprsz; i += sizeof(uint64_t)) { + *(uint64_t *)(d + i) = c; + } + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_dup32)(void *d, uint32_t desc, uint32_t c) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + if (c == 0) { + oprsz = 0; + } else { + for (i = 0; i < oprsz; i += sizeof(uint32_t)) { + *(uint32_t *)(d + i) = c; + } + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_dup16)(void *d, uint32_t desc, uint32_t c) +{ + HELPER(gvec_dup32)(d, desc, 0x00010001 * (c & 0xffff)); +} + +void HELPER(gvec_dup8)(void *d, uint32_t desc, uint32_t c) +{ + HELPER(gvec_dup32)(d, desc, 0x01010101 * (c & 0xff)); +} + +void HELPER(gvec_not)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = ~*(vec64 *)(a + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_and)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) & *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_or)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) | *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_xor)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) ^ *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_andc)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) &~ *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_orc)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) |~ *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c new file mode 100644 index 0000000000..120e301096 --- /dev/null +++ b/tcg/tcg-op-gvec.c @@ -0,0 +1,1099 @@ +/* + * Generic vector operation expansion + * + * Copyright (c) 2017 Linaro + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see . + */ + +#include "qemu/osdep.h" +#include "qemu-common.h" +#include "tcg.h" +#include "tcg-op.h" +#include "tcg-op-gvec.h" +#include "tcg-gvec-desc.h" + +#define REP8(x) ((x) * 0x0101010101010101ull) +#define REP16(x) ((x) * 0x0001000100010001ull) + +#define MAX_UNROLL 4 + +/* Verify vector size and alignment rules. OFS should be the OR of all + of the operand offsets so that we can check them all at once. */ +static void check_size_align(uint32_t oprsz, uint32_t maxsz, uint32_t ofs) +{ + uint32_t align = maxsz > 16 || oprsz >= 16 ? 15 : 7; + tcg_debug_assert(oprsz > 0); + tcg_debug_assert(oprsz <= maxsz); + tcg_debug_assert((oprsz & align) == 0); + tcg_debug_assert((maxsz & align) == 0); + tcg_debug_assert((ofs & align) == 0); +} + +/* Verify vector overlap rules for two operands. */ +static void check_overlap_2(uint32_t d, uint32_t a, uint32_t s) +{ + tcg_debug_assert(d == a || d + s <= a || a + s <= d); +} + +/* Verify vector overlap rules for three operands. */ +static void check_overlap_3(uint32_t d, uint32_t a, uint32_t b, uint32_t s) +{ + check_overlap_2(d, a, s); + check_overlap_2(d, b, s); + check_overlap_2(a, b, s); +} + +/* Create a descriptor from components. */ +uint32_t simd_desc(uint32_t oprsz, uint32_t maxsz, int32_t data) +{ + uint32_t desc = 0; + + assert(oprsz % 8 == 0 && oprsz <= (8 << SIMD_OPRSZ_BITS)); + assert(maxsz % 8 == 0 && maxsz <= (8 << SIMD_MAXSZ_BITS)); + assert(data == sextract32(data, 0, SIMD_DATA_BITS)); + + oprsz = (oprsz / 8) - 1; + maxsz = (maxsz / 8) - 1; + desc = deposit32(desc, SIMD_OPRSZ_SHIFT, SIMD_OPRSZ_BITS, oprsz); + desc = deposit32(desc, SIMD_MAXSZ_SHIFT, SIMD_MAXSZ_BITS, maxsz); + desc = deposit32(desc, SIMD_DATA_SHIFT, SIMD_DATA_BITS, data); + + return desc; +} + +/* Generate a call to a gvec-style helper with two vector operands. */ +void tcg_gen_gvec_2_ool(uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, int32_t data, + gen_helper_gvec_2 *fn) +{ + TCGv_ptr a0, a1; + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 = tcg_temp_new_ptr(); + a1 = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + tcg_gen_addi_ptr(a1, cpu_env, aofs); + + fn(a0, a1, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_i32(desc); +} + +/* Generate a call to a gvec-style helper with three vector operands. */ +void tcg_gen_gvec_3_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t oprsz, uint32_t maxsz, int32_t data, + gen_helper_gvec_3 *fn) +{ + TCGv_ptr a0, a1, a2; + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 = tcg_temp_new_ptr(); + a1 = tcg_temp_new_ptr(); + a2 = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + tcg_gen_addi_ptr(a1, cpu_env, aofs); + tcg_gen_addi_ptr(a2, cpu_env, bofs); + + fn(a0, a1, a2, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_ptr(a2); + tcg_temp_free_i32(desc); +} + +/* Generate a call to a gvec-style helper with three vector operands + and an extra pointer operand. */ +void tcg_gen_gvec_2_ptr(uint32_t dofs, uint32_t aofs, + TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz, + int32_t data, gen_helper_gvec_2_ptr *fn) +{ + TCGv_ptr a0, a1; + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 = tcg_temp_new_ptr(); + a1 = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + tcg_gen_addi_ptr(a1, cpu_env, aofs); + + fn(a0, a1, ptr, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_i32(desc); +} + +/* Generate a call to a gvec-style helper with three vector operands + and an extra pointer operand. */ +void tcg_gen_gvec_3_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs, + TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz, + int32_t data, gen_helper_gvec_3_ptr *fn) +{ + TCGv_ptr a0, a1, a2; + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 = tcg_temp_new_ptr(); + a1 = tcg_temp_new_ptr(); + a2 = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + tcg_gen_addi_ptr(a1, cpu_env, aofs); + tcg_gen_addi_ptr(a2, cpu_env, bofs); + + fn(a0, a1, a2, ptr, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_ptr(a2); + tcg_temp_free_i32(desc); +} + +/* Return true if we want to implement something of OPRSZ bytes + in units of LNSZ. This limits the expansion of inline code. */ +static inline bool check_size_impl(uint32_t oprsz, uint32_t lnsz) +{ + uint32_t lnct = oprsz / lnsz; + return lnct >= 1 && lnct <= MAX_UNROLL; +} + +static void expand_clr(uint32_t dofs, uint32_t maxsz); + +/* Set OPRSZ bytes at DOFS to replications of IN or IN_C. */ +static void do_dup_i32(unsigned vece, uint32_t dofs, uint32_t oprsz, + uint32_t maxsz, TCGv_i32 in, uint32_t in_c, + void (*ool)(TCGv_ptr, TCGv_i32, TCGv_i32)) +{ + TCGType type; + TCGv_vec t_vec; + uint32_t i; + + assert(vece <= MO_32); + + if (TCG_TARGET_HAS_v256 && check_size_impl(oprsz, 32)) { + type = TCG_TYPE_V256; + } else if (TCG_TARGET_HAS_v128 && check_size_impl(oprsz, 16)) { + type = TCG_TYPE_V128; + } else if (TCG_TARGET_HAS_v64 && check_size_impl(oprsz, 8)) { + type = TCG_TYPE_V64; + } else { + if (check_size_impl(oprsz, 4)) { + TCGv_i32 t_i32 = tcg_temp_new_i32(); + + if (in) { + switch (vece) { + case MO_8: + tcg_gen_deposit_i32(t_i32, in, in, 8, 24); + in = t_i32; + /* fallthru */ + case MO_16: + tcg_gen_deposit_i32(t_i32, in, in, 16, 16); + break; + } + } else { + switch (vece) { + case MO_8: + in_c = (in_c & 0xff) * 0x01010101; + break; + case MO_16: + in_c = deposit32(in_c, 16, 16, in_c); + break; + } + tcg_gen_movi_i32(t_i32, in_c); + } + + for (i = 0; i < oprsz; i += 4) { + tcg_gen_st_i32(t_i32, cpu_env, dofs + i); + } + tcg_temp_free_i32(t_i32); + goto done; + } else { + TCGv_i32 t_i32 = in ? in : tcg_const_i32(in_c); + TCGv_ptr a0 = tcg_temp_new_ptr(); + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, 0)); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + ool(a0, desc, t_i32); + + tcg_temp_free_ptr(a0); + tcg_temp_free_i32(desc); + if (in == NULL) { + tcg_temp_free_i32(t_i32); + } + return; + } + } + + t_vec = tcg_temp_new_vec(type); + if (in) { + tcg_gen_dup_i32_vec(vece, t_vec, in); + } else { + switch (vece) { + case MO_8: + tcg_gen_dup8i_vec(t_vec, in_c); + break; + case MO_16: + tcg_gen_dup16i_vec(t_vec, in_c); + break; + default: + tcg_gen_dup32i_vec(t_vec, in_c); + break; + } + } + + i = 0; + if (TCG_TARGET_HAS_v256) { + for (; i + 32 <= oprsz; i += 32) { + tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V256); + } + } + if (TCG_TARGET_HAS_v128) { + for (; i + 16 <= oprsz; i += 16) { + tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V128); + } + } + if (TCG_TARGET_HAS_v64) { + for (; i < oprsz; i += 8) { + tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V64); + } + } + tcg_temp_free_vec(t_vec); + + done: + tcg_debug_assert(i == oprsz); + if (i < maxsz) { + expand_clr(dofs + i, maxsz - i); + } +} + +/* Likewise, but with 64-bit quantities. */ +static void do_dup_i64(unsigned vece, uint32_t dofs, uint32_t oprsz, + uint32_t maxsz, TCGv_i64 in, uint64_t in_c) +{ + TCGType type; + TCGv_vec t_vec; + uint32_t i; + + assert(vece <= MO_64); + + if (TCG_TARGET_HAS_v256 && check_size_impl(oprsz, 32)) { + type = TCG_TYPE_V256; + } else if (TCG_TARGET_HAS_v128 && check_size_impl(oprsz, 16)) { + type = TCG_TYPE_V128; + } else if (TCG_TARGET_HAS_v64 && TCG_TARGET_REG_BITS == 32 + && check_size_impl(oprsz, 8)) { + type = TCG_TYPE_V64; + } else { + if (check_size_impl(oprsz, 8)) { + TCGv_i64 t_i64 = tcg_temp_new_i64(); + + if (in) { + switch (vece) { + case MO_8: + tcg_gen_deposit_i64(t_i64, in, in, 8, 56); + in = t_i64; + /* fallthru */ + case MO_16: + tcg_gen_deposit_i64(t_i64, in, in, 16, 48); + in = t_i64; + /* fallthru */ + case MO_32: + tcg_gen_deposit_i64(t_i64, in, in, 32, 32); + break; + } + } else { + switch (vece) { + case MO_8: + in_c = (in_c & 0xff) * 0x0101010101010101ull; + break; + case MO_16: + in_c = (in_c & 0xffff) * 0x0001000100010001ull; + break; + case MO_32: + in_c = deposit64(in_c, 32, 32, in_c); + break; + } + tcg_gen_movi_i64(t_i64, in_c); + } + + for (i = 0; i < oprsz; i += 8) { + tcg_gen_st_i64(t_i64, cpu_env, dofs + i); + } + tcg_temp_free_i64(t_i64); + goto done; + } else { + TCGv_i64 t_i64 = in ? in : tcg_const_i64(in_c); + TCGv_ptr a0 = tcg_temp_new_ptr(); + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, 0)); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + gen_helper_gvec_dup64(a0, desc, t_i64); + + tcg_temp_free_ptr(a0); + tcg_temp_free_i32(desc); + if (in == NULL) { + tcg_temp_free_i64(t_i64); + } + return; + } + } + + t_vec = tcg_temp_new_vec(type); + if (in) { + tcg_gen_dup_i64_vec(vece, t_vec, in); + } else { + switch (vece) { + case MO_8: + tcg_gen_dup8i_vec(t_vec, in_c); + break; + case MO_16: + tcg_gen_dup16i_vec(t_vec, in_c); + break; + case MO_32: + tcg_gen_dup32i_vec(t_vec, in_c); + break; + default: + tcg_gen_dup64i_vec(t_vec, in_c); + break; + } + } + + i = 0; + if (TCG_TARGET_HAS_v256) { + for (; i + 32 <= oprsz; i += 32) { + tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V256); + } + } + if (TCG_TARGET_HAS_v128) { + for (; i + 16 <= oprsz; i += 16) { + tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V128); + } + } + if (TCG_TARGET_HAS_v64) { + for (; i < oprsz; i += 8) { + tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V64); + } + } + tcg_temp_free_vec(t_vec); + + done: + tcg_debug_assert(i == oprsz); + if (i < maxsz) { + expand_clr(dofs + i, maxsz - i); + } +} + +/* Likewise, but with zero. */ +static void expand_clr(uint32_t dofs, uint32_t maxsz) +{ + if (TCG_TARGET_REG_BITS == 64) { + do_dup_i64(MO_64, dofs, maxsz, maxsz, NULL, 0); + } else { + do_dup_i32(MO_32, dofs, maxsz, maxsz, NULL, 0, gen_helper_gvec_dup32); + } +} + +/* Expand OPSZ bytes worth of two-operand operations using i32 elements. */ +static void expand_2_i32(uint32_t dofs, uint32_t aofs, uint32_t opsz, + void (*fni)(TCGv_i32, TCGv_i32)) +{ + TCGv_i32 t0 = tcg_temp_new_i32(); + uint32_t i; + + for (i = 0; i < opsz; i += 4) { + tcg_gen_ld_i32(t0, cpu_env, aofs + i); + fni(t0, t0); + tcg_gen_st_i32(t0, cpu_env, dofs + i); + } + tcg_temp_free_i32(t0); +} + +/* Expand OPSZ bytes worth of three-operand operations using i32 elements. */ +static void expand_3_i32(uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, bool load_dest, + void (*fni)(TCGv_i32, TCGv_i32, TCGv_i32)) +{ + TCGv_i32 t0 = tcg_temp_new_i32(); + TCGv_i32 t1 = tcg_temp_new_i32(); + TCGv_i32 t2 = tcg_temp_new_i32(); + uint32_t i; + + for (i = 0; i < opsz; i += 4) { + tcg_gen_ld_i32(t0, cpu_env, aofs + i); + tcg_gen_ld_i32(t1, cpu_env, bofs + i); + if (load_dest) { + tcg_gen_ld_i32(t2, cpu_env, dofs + i); + } + fni(t2, t0, t1); + tcg_gen_st_i32(t2, cpu_env, dofs + i); + } + tcg_temp_free_i32(t2); + tcg_temp_free_i32(t1); + tcg_temp_free_i32(t0); +} + +/* Expand OPSZ bytes worth of two-operand operations using i64 elements. */ +static void expand_2_i64(uint32_t dofs, uint32_t aofs, uint32_t opsz, + void (*fni)(TCGv_i64, TCGv_i64)) +{ + TCGv_i64 t0 = tcg_temp_new_i64(); + uint32_t i; + + for (i = 0; i < opsz; i += 8) { + tcg_gen_ld_i64(t0, cpu_env, aofs + i); + fni(t0, t0); + tcg_gen_st_i64(t0, cpu_env, dofs + i); + } + tcg_temp_free_i64(t0); +} + +/* Expand OPSZ bytes worth of three-operand operations using i64 elements. */ +static void expand_3_i64(uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, bool load_dest, + void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64)) +{ + TCGv_i64 t0 = tcg_temp_new_i64(); + TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 t2 = tcg_temp_new_i64(); + uint32_t i; + + for (i = 0; i < opsz; i += 8) { + tcg_gen_ld_i64(t0, cpu_env, aofs + i); + tcg_gen_ld_i64(t1, cpu_env, bofs + i); + if (load_dest) { + tcg_gen_ld_i64(t2, cpu_env, dofs + i); + } + fni(t2, t0, t1); + tcg_gen_st_i64(t2, cpu_env, dofs + i); + } + tcg_temp_free_i64(t2); + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t0); +} + +/* Expand OPSZ bytes worth of two-operand operations using host vectors. */ +static void expand_2_vec(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t tysz, TCGType type, + void (*fni)(unsigned, TCGv_vec, TCGv_vec)) +{ + TCGv_vec t0 = tcg_temp_new_vec(type); + uint32_t i; + + for (i = 0; i < opsz; i += tysz) { + tcg_gen_ld_vec(t0, cpu_env, aofs + i); + fni(vece, t0, t0); + tcg_gen_st_vec(t0, cpu_env, dofs + i); + } + tcg_temp_free_vec(t0); +} + +/* Expand OPSZ bytes worth of three-operand operations using host vectors. */ +static void expand_3_vec(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, + uint32_t tysz, TCGType type, bool load_dest, + void (*fni)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec)) +{ + TCGv_vec t0 = tcg_temp_new_vec(type); + TCGv_vec t1 = tcg_temp_new_vec(type); + TCGv_vec t2 = tcg_temp_new_vec(type); + uint32_t i; + + for (i = 0; i < opsz; i += tysz) { + tcg_gen_ld_vec(t0, cpu_env, aofs + i); + tcg_gen_ld_vec(t1, cpu_env, bofs + i); + if (load_dest) { + tcg_gen_ld_vec(t2, cpu_env, dofs + i); + } + fni(vece, t2, t0, t1); + tcg_gen_st_vec(t2, cpu_env, dofs + i); + } + tcg_temp_free_vec(t2); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t0); +} + +/* Expand a vector two-operand operation. */ +void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, const GVecGen2 *g) +{ + check_size_align(oprsz, maxsz, dofs | aofs); + check_overlap_2(dofs, aofs, maxsz); + + /* Quick check for sizes we won't support inline. */ + if (oprsz > MAX_UNROLL * 32 || maxsz > MAX_UNROLL * 32) { + goto do_ool; + } + + /* Recall that ARM SVE allows vector sizes that are not a power of 2. + Expand with successively smaller host vector sizes. The intent is + that e.g. oprsz == 80 would be expanded with 2x32 + 1x16. */ + /* ??? For maxsz > oprsz, the host may be able to use an op-sized + operation, zeroing the balance of the register. We can then + use a cl-sized store to implement the clearing without an extra + store operation. This is true for aarch64 and x86_64 hosts. */ + + if (TCG_TARGET_HAS_v256 && check_size_impl(oprsz, 32)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 32); + expand_2_vec(g->vece, dofs, aofs, done, 32, TCG_TYPE_V256, g->fniv); + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (TCG_TARGET_HAS_v128 && check_size_impl(oprsz, 16)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 16); + expand_2_vec(g->vece, dofs, aofs, done, 16, TCG_TYPE_V128, g->fniv); + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (check_size_impl(oprsz, 8)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 8); + if (TCG_TARGET_HAS_v64 && !g->prefer_i64) { + expand_2_vec(g->vece, dofs, aofs, done, 8, TCG_TYPE_V64, g->fniv); + } else if (g->fni8) { + expand_2_i64(dofs, aofs, done, g->fni8); + } else { + done = 0; + } + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (g->fni4 && check_size_impl(oprsz, 4)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 4); + expand_2_i32(dofs, aofs, done, g->fni4); + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (oprsz == 0) { + if (maxsz != 0) { + expand_clr(dofs, maxsz); + } + return; + } + + do_ool: + tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, 0, g->fno); +} + +/* Expand a vector three-operand operation. */ +void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t oprsz, uint32_t maxsz, const GVecGen3 *g) +{ + check_size_align(oprsz, maxsz, dofs | aofs | bofs); + check_overlap_3(dofs, aofs, bofs, maxsz); + + /* Quick check for sizes we won't support inline. */ + if (oprsz > MAX_UNROLL * 32 || maxsz > MAX_UNROLL * 32) { + goto do_ool; + } + + /* Recall that ARM SVE allows vector sizes that are not a power of 2. + Expand with successively smaller host vector sizes. The intent is + that e.g. oprsz == 80 would be expanded with 2x32 + 1x16. */ + /* ??? For maxsz > oprsz, the host may be able to use an op-sized + operation, zeroing the balance of the register. We can then + use a cl-sized store to implement the clearing without an extra + store operation. This is true for aarch64 and x86_64 hosts. */ + + if (TCG_TARGET_HAS_v256 && check_size_impl(oprsz, 32)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 32); + expand_3_vec(g->vece, dofs, aofs, bofs, done, 32, TCG_TYPE_V256, + g->load_dest, g->fniv); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (TCG_TARGET_HAS_v128 && check_size_impl(oprsz, 16)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 16); + expand_3_vec(g->vece, dofs, aofs, bofs, done, 16, TCG_TYPE_V128, + g->load_dest, g->fniv); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (check_size_impl(oprsz, 8)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 8); + if (TCG_TARGET_HAS_v64 && !g->prefer_i64) { + expand_3_vec(g->vece, dofs, aofs, bofs, done, 8, TCG_TYPE_V64, + g->load_dest, g->fniv); + } else if (g->fni8) { + expand_3_i64(dofs, aofs, bofs, done, g->load_dest, g->fni8); + } else { + done = 0; + } + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (g->fni4 && check_size_impl(oprsz, 4)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 4); + expand_3_i32(dofs, aofs, bofs, done, g->load_dest, g->fni4); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (oprsz == 0) { + if (maxsz != 0) { + expand_clr(dofs, maxsz); + } + return; + } + + do_ool: + tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, 0, g->fno); +} + +/* + * Expand specific vector operations. + */ + +static void vec_mov2(unsigned vece, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_mov_vec(a, b); +} + +void tcg_gen_gvec_mov(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen2 g = { + .fni8 = tcg_gen_mov_i64, + .fniv = vec_mov2, + .fno = gen_helper_gvec_mov, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_2(dofs, aofs, opsz, maxsz, &g); +} + +void tcg_gen_gvec_dup_i32(unsigned vece, uint32_t dofs, uint32_t oprsz, + uint32_t maxsz, TCGv_i32 in) +{ + typedef void dup_fn(TCGv_ptr, TCGv_i32, TCGv_i32); + static dup_fn * const fns[3] = { + gen_helper_gvec_dup8, + gen_helper_gvec_dup16, + gen_helper_gvec_dup32 + }; + + check_size_align(oprsz, maxsz, dofs); + tcg_debug_assert(vece <= MO_32); + do_dup_i32(vece, dofs, oprsz, maxsz, in, 0, fns[vece]); +} + +void tcg_gen_gvec_dup_i64(unsigned vece, uint32_t dofs, uint32_t oprsz, + uint32_t maxsz, TCGv_i64 in) +{ + check_size_align(oprsz, maxsz, dofs); + tcg_debug_assert(vece <= MO_64); + if (vece <= MO_32) { + /* This works for both host register sizes. */ + tcg_gen_gvec_dup_i32(vece, dofs, oprsz, maxsz, (TCGv_i32)in); + } else { + do_dup_i64(vece, dofs, oprsz, maxsz, in, 0); + } +} + +void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz) +{ + tcg_debug_assert(vece <= MO_64); + if (vece <= MO_32) { + TCGv_i32 in = tcg_temp_new_i32(); + switch (vece) { + case MO_8: + tcg_gen_ld8u_i32(in, cpu_env, aofs); + break; + case MO_16: + tcg_gen_ld16u_i32(in, cpu_env, aofs); + break; + case MO_32: + tcg_gen_ld_i32(in, cpu_env, aofs); + break; + } + tcg_gen_gvec_dup_i32(vece, dofs, oprsz, maxsz, in); + tcg_temp_free_i32(in); + } else { + TCGv_i64 in = tcg_temp_new_i64(); + tcg_gen_ld_i64(in, cpu_env, aofs); + tcg_gen_gvec_dup_i64(MO_64, dofs, oprsz, maxsz, in); + tcg_temp_free_i64(in); + } +} + +void tcg_gen_gvec_dup64i(uint32_t dofs, uint32_t oprsz, + uint32_t maxsz, uint64_t x) +{ + check_size_align(oprsz, maxsz, dofs); + do_dup_i64(MO_64, dofs, oprsz, maxsz, NULL, x); +} + +void tcg_gen_gvec_dup32i(uint32_t dofs, uint32_t oprsz, + uint32_t maxsz, uint32_t x) +{ + if (TCG_TARGET_REG_BITS == 64) { + do_dup_i64(MO_64, dofs, oprsz, maxsz, NULL, deposit64(x, 32, 32, x)); + } else { + do_dup_i32(MO_32, dofs, oprsz, maxsz, NULL, x, gen_helper_gvec_dup32); + } +} + +void tcg_gen_gvec_dup16i(uint32_t dofs, uint32_t oprsz, + uint32_t maxsz, uint16_t x) +{ + tcg_gen_gvec_dup32i(dofs, oprsz, maxsz, 0x00010001 * x); +} + +void tcg_gen_gvec_dup8i(uint32_t dofs, uint32_t oprsz, + uint32_t maxsz, uint8_t x) +{ + tcg_gen_gvec_dup32i(dofs, oprsz, maxsz, 0x01010101 * x); +} + +void tcg_gen_gvec_not(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen2 g = { + .fni8 = tcg_gen_not_i64, + .fniv = tcg_gen_not_vec, + .fno = gen_helper_gvec_not, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_2(dofs, aofs, opsz, maxsz, &g); +} + +/* Perform a vector addition using normal addition and a mask. The mask + should be the sign bit of each lane. This 6-operation form is more + efficient than separate additions when there are 4 or more lanes in + the 64-bit operation. */ +static void gen_addv_mask(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, TCGv_i64 m) +{ + TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 t2 = tcg_temp_new_i64(); + TCGv_i64 t3 = tcg_temp_new_i64(); + + tcg_gen_andc_i64(t1, a, m); + tcg_gen_andc_i64(t2, b, m); + tcg_gen_xor_i64(t3, a, b); + tcg_gen_add_i64(d, t1, t2); + tcg_gen_and_i64(t3, t3, m); + tcg_gen_xor_i64(d, d, t3); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); + tcg_temp_free_i64(t3); +} + +void tcg_gen_vec_add8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 m = tcg_const_i64(REP8(0x80)); + gen_addv_mask(d, a, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec_add16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 m = tcg_const_i64(REP16(0x8000)); + gen_addv_mask(d, a, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec_add32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 t2 = tcg_temp_new_i64(); + + tcg_gen_andi_i64(t1, a, ~0xffffffffull); + tcg_gen_add_i64(t2, a, b); + tcg_gen_add_i64(t1, t1, b); + tcg_gen_deposit_i64(d, t1, t2, 0, 32); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); +} + +void tcg_gen_gvec_add(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen3 g[4] = { + { .fni8 = tcg_gen_vec_add8_i64, + .fniv = tcg_gen_add_vec, + .fno = gen_helper_gvec_add8, + .opc = INDEX_op_add_vec, + .vece = MO_8 }, + { .fni8 = tcg_gen_vec_add16_i64, + .fniv = tcg_gen_add_vec, + .fno = gen_helper_gvec_add16, + .opc = INDEX_op_add_vec, + .vece = MO_16 }, + { .fni4 = tcg_gen_add_i32, + .fniv = tcg_gen_add_vec, + .fno = gen_helper_gvec_add32, + .opc = INDEX_op_add_vec, + .vece = MO_32 }, + { .fni8 = tcg_gen_add_i64, + .fniv = tcg_gen_add_vec, + .fno = gen_helper_gvec_add64, + .opc = INDEX_op_add_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g[vece]); +} + +/* Perform a vector subtraction using normal subtraction and a mask. + Compare gen_addv_mask above. */ +static void gen_subv_mask(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, TCGv_i64 m) +{ + TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 t2 = tcg_temp_new_i64(); + TCGv_i64 t3 = tcg_temp_new_i64(); + + tcg_gen_or_i64(t1, a, m); + tcg_gen_andc_i64(t2, b, m); + tcg_gen_eqv_i64(t3, a, b); + tcg_gen_sub_i64(d, t1, t2); + tcg_gen_and_i64(t3, t3, m); + tcg_gen_xor_i64(d, d, t3); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); + tcg_temp_free_i64(t3); +} + +void tcg_gen_vec_sub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 m = tcg_const_i64(REP8(0x80)); + gen_subv_mask(d, a, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec_sub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 m = tcg_const_i64(REP16(0x8000)); + gen_subv_mask(d, a, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 t2 = tcg_temp_new_i64(); + + tcg_gen_andi_i64(t1, b, ~0xffffffffull); + tcg_gen_sub_i64(t2, a, b); + tcg_gen_sub_i64(t1, a, t1); + tcg_gen_deposit_i64(d, t1, t2, 0, 32); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); +} + +void tcg_gen_gvec_sub(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen3 g[4] = { + { .fni8 = tcg_gen_vec_sub8_i64, + .fniv = tcg_gen_sub_vec, + .fno = gen_helper_gvec_sub8, + .opc = INDEX_op_sub_vec, + .vece = MO_8 }, + { .fni8 = tcg_gen_vec_sub16_i64, + .fniv = tcg_gen_sub_vec, + .fno = gen_helper_gvec_sub16, + .opc = INDEX_op_sub_vec, + .vece = MO_16 }, + { .fni4 = tcg_gen_sub_i32, + .fniv = tcg_gen_sub_vec, + .fno = gen_helper_gvec_sub32, + .opc = INDEX_op_sub_vec, + .vece = MO_32 }, + { .fni8 = tcg_gen_sub_i64, + .fniv = tcg_gen_sub_vec, + .fno = gen_helper_gvec_sub64, + .opc = INDEX_op_sub_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g[vece]); +} + +/* Perform a vector negation using normal negation and a mask. + Compare gen_subv_mask above. */ +static void gen_negv_mask(TCGv_i64 d, TCGv_i64 b, TCGv_i64 m) +{ + TCGv_i64 t2 = tcg_temp_new_i64(); + TCGv_i64 t3 = tcg_temp_new_i64(); + + tcg_gen_andc_i64(t3, m, b); + tcg_gen_andc_i64(t2, b, m); + tcg_gen_sub_i64(d, m, t2); + tcg_gen_xor_i64(d, d, t3); + + tcg_temp_free_i64(t2); + tcg_temp_free_i64(t3); +} + +void tcg_gen_vec_neg8_i64(TCGv_i64 d, TCGv_i64 b) +{ + TCGv_i64 m = tcg_const_i64(REP8(0x80)); + gen_negv_mask(d, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec_neg16_i64(TCGv_i64 d, TCGv_i64 b) +{ + TCGv_i64 m = tcg_const_i64(REP16(0x8000)); + gen_negv_mask(d, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec_neg32_i64(TCGv_i64 d, TCGv_i64 b) +{ + TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 t2 = tcg_temp_new_i64(); + + tcg_gen_andi_i64(t1, b, ~0xffffffffull); + tcg_gen_neg_i64(t2, b); + tcg_gen_neg_i64(t1, t1); + tcg_gen_deposit_i64(d, t1, t2, 0, 32); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); +} + +void tcg_gen_gvec_neg(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen2 g[4] = { + { .fni8 = tcg_gen_vec_neg8_i64, + .fniv = tcg_gen_neg_vec, + .fno = gen_helper_gvec_neg8, + .opc = INDEX_op_neg_vec, + .vece = MO_8 }, + { .fni8 = tcg_gen_vec_neg16_i64, + .fniv = tcg_gen_neg_vec, + .fno = gen_helper_gvec_neg16, + .opc = INDEX_op_neg_vec, + .vece = MO_16 }, + { .fni4 = tcg_gen_neg_i32, + .fniv = tcg_gen_neg_vec, + .fno = gen_helper_gvec_neg32, + .opc = INDEX_op_neg_vec, + .vece = MO_32 }, + { .fni8 = tcg_gen_neg_i64, + .fniv = tcg_gen_neg_vec, + .fno = gen_helper_gvec_neg64, + .opc = INDEX_op_neg_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_2(dofs, aofs, opsz, maxsz, &g[vece]); +} + +void tcg_gen_gvec_and(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen3 g = { + .fni8 = tcg_gen_and_i64, + .fniv = tcg_gen_and_vec, + .fno = gen_helper_gvec_and, + .opc = INDEX_op_and_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g); +} + +void tcg_gen_gvec_or(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen3 g = { + .fni8 = tcg_gen_or_i64, + .fniv = tcg_gen_or_vec, + .fno = gen_helper_gvec_or, + .opc = INDEX_op_or_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g); +} + +void tcg_gen_gvec_xor(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen3 g = { + .fni8 = tcg_gen_xor_i64, + .fniv = tcg_gen_xor_vec, + .fno = gen_helper_gvec_xor, + .opc = INDEX_op_xor_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g); +} + +void tcg_gen_gvec_andc(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen3 g = { + .fni8 = tcg_gen_andc_i64, + .fniv = tcg_gen_andc_vec, + .fno = gen_helper_gvec_andc, + .opc = INDEX_op_andc_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g); +} + +void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen3 g = { + .fni8 = tcg_gen_orc_i64, + .fniv = tcg_gen_orc_vec, + .fno = gen_helper_gvec_orc, + .opc = INDEX_op_orc_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g); +} diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c index dc04c11860..5cfe4af6bd 100644 --- a/tcg/tcg-op-vec.c +++ b/tcg/tcg-op-vec.c @@ -104,7 +104,7 @@ void tcg_gen_mov_vec(TCGv_vec r, TCGv_vec a) #define MO_REG (TCG_TARGET_REG_BITS == 64 ? MO_64 : MO_32) -static void tcg_gen_dupi_vec(TCGv_vec r, unsigned vece, TCGArg a) +static void do_dupi_vec(TCGv_vec r, unsigned vece, TCGArg a) { TCGTemp *rt = tcgv_vec_temp(r); vec_gen_2(INDEX_op_dupi_vec, rt->base_type, vece, temp_arg(rt), a); @@ -113,14 +113,14 @@ static void tcg_gen_dupi_vec(TCGv_vec r, unsigned vece, TCGArg a) TCGv_vec tcg_const_zeros_vec(TCGType type) { TCGv_vec ret = tcg_temp_new_vec(type); - tcg_gen_dupi_vec(ret, MO_REG, 0); + do_dupi_vec(ret, MO_REG, 0); return ret; } TCGv_vec tcg_const_ones_vec(TCGType type) { TCGv_vec ret = tcg_temp_new_vec(type); - tcg_gen_dupi_vec(ret, MO_REG, -1); + do_dupi_vec(ret, MO_REG, -1); return ret; } @@ -139,9 +139,9 @@ TCGv_vec tcg_const_ones_vec_matching(TCGv_vec m) void tcg_gen_dup64i_vec(TCGv_vec r, uint64_t a) { if (TCG_TARGET_REG_BITS == 32 && a == deposit64(a, 32, 32, a)) { - tcg_gen_dupi_vec(r, MO_32, a); + do_dupi_vec(r, MO_32, a); } else if (TCG_TARGET_REG_BITS == 64 || a == (uint64_t)(int32_t)a) { - tcg_gen_dupi_vec(r, MO_64, a); + do_dupi_vec(r, MO_64, a); } else { TCGv_i64 c = tcg_const_i64(a); tcg_gen_dup_i64_vec(MO_64, r, c); @@ -151,17 +151,37 @@ void tcg_gen_dup64i_vec(TCGv_vec r, uint64_t a) void tcg_gen_dup32i_vec(TCGv_vec r, uint32_t a) { - tcg_gen_dupi_vec(r, MO_REG, ((TCGArg)-1 / 0xffffffffu) * a); + do_dupi_vec(r, MO_REG, ((TCGArg)-1 / 0xffffffffu) * a); } void tcg_gen_dup16i_vec(TCGv_vec r, uint32_t a) { - tcg_gen_dupi_vec(r, MO_REG, ((TCGArg)-1 / 0xffff) * (a & 0xffff)); + do_dupi_vec(r, MO_REG, ((TCGArg)-1 / 0xffff) * (a & 0xffff)); } void tcg_gen_dup8i_vec(TCGv_vec r, uint32_t a) { - tcg_gen_dupi_vec(r, MO_REG, ((TCGArg)-1 / 0xff) * (a & 0xff)); + do_dupi_vec(r, MO_REG, ((TCGArg)-1 / 0xff) * (a & 0xff)); +} + +void tcg_gen_dupi_vec(unsigned vece, TCGv_vec r, uint64_t a) +{ + switch (vece) { + case MO_8: + tcg_gen_dup8i_vec(r, a); + break; + case MO_16: + tcg_gen_dup16i_vec(r, a); + break; + case MO_32: + tcg_gen_dup32i_vec(r, a); + break; + case MO_64: + tcg_gen_dup64i_vec(r, a); + break; + default: + g_assert_not_reached(); + } } void tcg_gen_movi_v64(TCGv_vec r, uint64_t a) diff --git a/accel/tcg/Makefile.objs b/accel/tcg/Makefile.objs index 228cd84fa4..d381a02f34 100644 --- a/accel/tcg/Makefile.objs +++ b/accel/tcg/Makefile.objs @@ -1,6 +1,6 @@ obj-$(CONFIG_SOFTMMU) += tcg-all.o obj-$(CONFIG_SOFTMMU) += cputlb.o -obj-y += tcg-runtime.o +obj-y += tcg-runtime.o tcg-runtime-gvec.o obj-y += cpu-exec.o cpu-exec-common.o translate-all.o obj-y += translator.o From patchwork Mon Dec 18 17:17:35 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122248 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3105845qgn; Mon, 18 Dec 2017 09:24:45 -0800 (PST) X-Google-Smtp-Source: ACJfBotmy3rB/LRAD5t2WHv+H5bptQ8wrn3sXvB4ZaRcMrsqAvTQj+JSuQhL5kZB1jkeRqSUAtZq X-Received: by 10.129.41.18 with SMTP id p18mr378900ywp.245.1513617885121; Mon, 18 Dec 2017 09:24:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513617885; cv=none; d=google.com; s=arc-20160816; b=B1I1oTGCahdupHHj//EEJYcsiFZ+DdMstz/g8Dkr3IIQcskeWTkJmZewGMoKwFc2g+ qr424oLh6bnbbGrXHkaFTkC39I79Xt81RqwsRpQu2QUEixn4BLfPeb0/tG2bHpqKuKWo 3iru3SFDAx0CZmq3nFUm2cTMQQZvwwjIw/IXBJzBplNAbz2G5ToK9+TnQK0VRG89B9Ou Cl4ZkIhAPKs8N9RPImVnuCJIQbHyin910lNm+IsUNtSqmli+Ftt790u8e5M8ndynAVKS TNYgmGcIiA/rmWFSJ5vXGXYmDCpC8RBQOM1NonRlQgLQumABK0bJhUqz5HkhetwiMEW0 70CQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=hWNeV+zOx+oAfKu9VCxGrhv2jH/fOaNDlWpbf3/FB4Y=; b=osO2t2HQnax7VrcFbMUShSpU6egIptGvjVav/3SjZQr9PofhtVfhyTsdHDCf99x1kW 8jKR9OTu0tyIwFLDk7hrNKkvLaJ+293KI806dMMtARKOHdNDUPq+2VUlSeoy4/H+xaWU XSXhpEZ+D73FlKmq/GmWyrbNaMo5XXvvsm22x6FhXIrxeRaKUixY3oSyAO4msBwP9BaJ qjI2mNbFFF/mSqORV2yOzY3R420zzIs5hZrNMlugnda+OSvoaAYjiwNzp6QiseN1JuGe qDo06YokSy2oXRHLeSGCBG5LIZGyGlDRCNlPMgmba1pHFiyNLujo6ikcajpiUgC5XaMp R1sw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=TP1P3lv9; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id c187si2487391ywf.364.2017.12.18.09.24.45 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:24:45 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=TP1P3lv9; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58555 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz9g-0004qD-GF for patch@linaro.org; Mon, 18 Dec 2017 12:24:44 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36844) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3Q-0000Lw-M8 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:22 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3H-0000mR-7G for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:16 -0500 Received: from mail-pg0-x243.google.com ([2607:f8b0:400e:c05::243]:38622) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3G-0000l8-Vr for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:07 -0500 Received: by mail-pg0-x243.google.com with SMTP id f12so9402659pgo.5 for ; Mon, 18 Dec 2017 09:18:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=hWNeV+zOx+oAfKu9VCxGrhv2jH/fOaNDlWpbf3/FB4Y=; b=TP1P3lv9EUE800AUxFWtPnWLn0XD+bQRwRhurZSF6XkwFEBhDYEwvkT8ntaTQCOCjq LcpEvX1L5IZz97tcZ2JD3hXoXz0qvoSNX6CT44/FJjYnSsYdToQyWCnPyYfQGxH00rDO G/knxq/e9OBOhcgLbqSLat3yoqY8hyHMVkZ74= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=hWNeV+zOx+oAfKu9VCxGrhv2jH/fOaNDlWpbf3/FB4Y=; b=Ko8TYCcmCU0PX+ykRBvPsFVUIGUuWoKFF1YHlmNwaImIJ5NMlNaSAmz6jNMynWzlpF l3If/nRP8r4YRFnVpGHUYqmWP39KF7Kf4mF/6nmkz4fGj2QsQJe472iPt9yPUhvXEob6 maym3TxkXtYe16alDlDg9CWYMW242m7lwfjO+CixrXta+M4E8qZKuZW6F8g1JOdTLlQD cXoXtZHG4kYvCqkZAHX9PptgVE5yrlD3KRa00bW2VV/nPVA2u7FBIUKz54rrhru6aqqT 6FVYbcAcIiQlXvn8Ka/ZB4312d3HWLRyhqFC2cdvNvZuCZNwmCfkLshMK2cLafqa3pW6 OoUQ== X-Gm-Message-State: AKGB3mLzKX7lPP0HJHPTFYqsRALA0N7uyS0a5dXIutVTG/ZJSp9VxMph tZ1s7H8U1azmApq8IShT7kuQElw0g3o= X-Received: by 10.101.70.196 with SMTP id n4mr331211pgr.353.1513617485711; Mon, 18 Dec 2017 09:18:05 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.04 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:04 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:35 -0800 Message-Id: <20171218171758.16964-4-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::243 Subject: [Qemu-devel] [PATCH v7 03/26] tcg: Allow multiple word entries into the constant pool X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" This will be required for storing vector constants. Signed-off-by: Richard Henderson --- tcg/tcg-pool.inc.c | 115 +++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 93 insertions(+), 22 deletions(-) -- 2.14.3 diff --git a/tcg/tcg-pool.inc.c b/tcg/tcg-pool.inc.c index 8a85131405..0f76e7bee3 100644 --- a/tcg/tcg-pool.inc.c +++ b/tcg/tcg-pool.inc.c @@ -22,39 +22,110 @@ typedef struct TCGLabelPoolData { struct TCGLabelPoolData *next; - tcg_target_ulong data; tcg_insn_unit *label; - intptr_t addend; - int type; + int addend : 32; + int rtype : 16; + int nlong : 16; + tcg_target_ulong data[]; } TCGLabelPoolData; -static void new_pool_label(TCGContext *s, tcg_target_ulong data, int type, - tcg_insn_unit *label, intptr_t addend) +static TCGLabelPoolData *new_pool_alloc(TCGContext *s, int nlong, int rtype, + tcg_insn_unit *label, int addend) { - TCGLabelPoolData *n = tcg_malloc(sizeof(*n)); - TCGLabelPoolData *i, **pp; + TCGLabelPoolData *n = tcg_malloc(sizeof(TCGLabelPoolData) + + sizeof(tcg_target_ulong) * nlong); - n->data = data; n->label = label; - n->type = type; n->addend = addend; + n->rtype = rtype; + n->nlong = nlong; + return n; +} + +static void new_pool_insert(TCGContext *s, TCGLabelPoolData *n) +{ + TCGLabelPoolData *i, **pp; + int nlong = n->nlong; /* Insertion sort on the pool. */ - for (pp = &s->pool_labels; (i = *pp) && i->data < data; pp = &i->next) { - continue; + for (pp = &s->pool_labels; (i = *pp) != NULL; pp = &i->next) { + if (nlong > i->nlong) { + break; + } + if (nlong < i->nlong) { + continue; + } + if (memcmp(n->data, i->data, sizeof(tcg_target_ulong) * nlong) >= 0) { + break; + } } n->next = *pp; *pp = n; } +/* The "usual" for generic integer code. */ +static inline void new_pool_label(TCGContext *s, tcg_target_ulong d, int rtype, + tcg_insn_unit *label, int addend) +{ + TCGLabelPoolData *n = new_pool_alloc(s, 1, rtype, label, addend); + n->data[0] = d; + new_pool_insert(s, n); +} + +/* For v64 or v128, depending on the host. */ +static inline void new_pool_l2(TCGContext *s, int rtype, tcg_insn_unit *label, + int addend, tcg_target_ulong d0, + tcg_target_ulong d1) +{ + TCGLabelPoolData *n = new_pool_alloc(s, 2, rtype, label, addend); + n->data[0] = d0; + n->data[1] = d1; + new_pool_insert(s, n); +} + +/* For v128 or v256, depending on the host. */ +static inline void new_pool_l4(TCGContext *s, int rtype, tcg_insn_unit *label, + int addend, tcg_target_ulong d0, + tcg_target_ulong d1, tcg_target_ulong d2, + tcg_target_ulong d3) +{ + TCGLabelPoolData *n = new_pool_alloc(s, 4, rtype, label, addend); + n->data[0] = d0; + n->data[1] = d1; + n->data[2] = d2; + n->data[3] = d3; + new_pool_insert(s, n); +} + +/* For v256, for 32-bit host. */ +static inline void new_pool_l8(TCGContext *s, int rtype, tcg_insn_unit *label, + int addend, tcg_target_ulong d0, + tcg_target_ulong d1, tcg_target_ulong d2, + tcg_target_ulong d3, tcg_target_ulong d4, + tcg_target_ulong d5, tcg_target_ulong d6, + tcg_target_ulong d7) +{ + TCGLabelPoolData *n = new_pool_alloc(s, 8, rtype, label, addend); + n->data[0] = d0; + n->data[1] = d1; + n->data[2] = d2; + n->data[3] = d3; + n->data[4] = d4; + n->data[5] = d5; + n->data[6] = d6; + n->data[7] = d7; + new_pool_insert(s, n); +} + /* To be provided by cpu/tcg-target.inc.c. */ static void tcg_out_nop_fill(tcg_insn_unit *p, int count); static bool tcg_out_pool_finalize(TCGContext *s) { TCGLabelPoolData *p = s->pool_labels; - tcg_target_ulong d, *a; + TCGLabelPoolData *l = NULL; + void *a; if (p == NULL) { return true; @@ -62,24 +133,24 @@ static bool tcg_out_pool_finalize(TCGContext *s) /* ??? Round up to qemu_icache_linesize, but then do not round again when allocating the next TranslationBlock structure. */ - a = (void *)ROUND_UP((uintptr_t)s->code_ptr, sizeof(tcg_target_ulong)); + a = (void *)ROUND_UP((uintptr_t)s->code_ptr, + sizeof(tcg_target_ulong) * p->nlong); tcg_out_nop_fill(s->code_ptr, (tcg_insn_unit *)a - s->code_ptr); s->data_gen_ptr = a; - /* Ensure the first comparison fails. */ - d = p->data + 1; - for (; p != NULL; p = p->next) { - if (p->data != d) { - d = p->data; - if (unlikely((void *)a > s->code_gen_highwater)) { + size_t size = sizeof(tcg_target_ulong) * p->nlong; + if (!l || l->nlong != p->nlong || memcmp(l->data, p->data, size)) { + if (unlikely(a > s->code_gen_highwater)) { return false; } - *a++ = d; + memcpy(a, p->data, size); + a += size; + l = p; } - patch_reloc(p->label, p->type, (intptr_t)(a - 1), p->addend); + patch_reloc(p->label, p->rtype, (intptr_t)a - size, p->addend); } - s->code_ptr = (void *)a; + s->code_ptr = a; return true; } From patchwork Mon Dec 18 17:17:36 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122242 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3099722qgn; Mon, 18 Dec 2017 09:18:52 -0800 (PST) X-Google-Smtp-Source: ACJfBosZyvVylrplpoHW/5jqz7Zx7edUnMLob2g54IQfsVW1sxJ4IL+7HVYu1MG71C2dVTulL1pP X-Received: by 10.37.24.193 with SMTP id 184mr384440yby.315.1513617532452; Mon, 18 Dec 2017 09:18:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513617532; cv=none; d=google.com; s=arc-20160816; b=kQ2cY6JEYEhPB5YlUP7MN2gBbykTSAqY6CtjHzqULAk+q1Xt8CzZUU9g3xynZnYy1B BZgNK9RUSTCI7ynIAPLNQs7OBYf8X83wvDDW+EEg7AAC7oNkmUvmSSunjlinQZbL/8vf I6otUtZc5esGEMDP29XGC912iFWMWkR6z9da0czYSoopk77c3JyIqpRjamaxGi3FuIlB 5iENXCKWNhZQzqp+1ReVmn3adM5pkOD9rVQQFXrrSteSVKsT+TeCHgu+l8Sn7yoyfdpB PoHwG1bc/Ehv9j/Zq5dX5YntznaTkqfps+fkOGGOAfqRoXOmISTz701FwPvUBJaFwIGa AlGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=GcMnjuke/WDDK9KhYcS2xUXa0sTVT3xT14OmjkG9q8k=; b=eEiTuO+5F3vBwjYF8fWvQLVyj0wE0ZJgZL1dpE5hN9ybk3tfR16Zx0DFx0GLsWxJqA lInN0Rd+xfSp4I++LRw7c1ww4F9EvWv+Go1nKFlYrYE/vCpvktsXQPefroyZfbTq0xfp LhcWFi63QbmcnKNGpfN51XDAWdU3uPfO7/PzVf6DOSIKgPXUFPEbzr1nN4r4K+MDfciP HZb/smujJcZRMNuyXQ8OnXxRVM0BWuEHeQ1y9jtFkR3wq7jg0oCOEZz/MAnl3pZLW+gj 6AH0pp8TE0gx5A1VZpdJuLFOZUl6khhjlnU7EvcWJeEkZphwAqO3HlEy7Lj0TAEGvU+G zdCw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=Ml5NGDFt; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id k65si1571076ybk.827.2017.12.18.09.18.52 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:18:52 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=Ml5NGDFt; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58429 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3z-0000Q0-Qk for patch@linaro.org; Mon, 18 Dec 2017 12:18:51 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36848) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3Q-0000M0-Nj for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:22 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3I-0000no-Dp for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:16 -0500 Received: from mail-pf0-x241.google.com ([2607:f8b0:400e:c00::241]:41710) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3I-0000n1-8M for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:08 -0500 Received: by mail-pf0-x241.google.com with SMTP id j28so9930482pfk.8 for ; Mon, 18 Dec 2017 09:18:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=GcMnjuke/WDDK9KhYcS2xUXa0sTVT3xT14OmjkG9q8k=; b=Ml5NGDFtfcZf5ZAdeapCZ3KHLD7e0MRUWecxLmAbeH+KqaTWdR+4FzEnOTS73QnRdE dLayD/atRVIw83hr790dE6BwD1Yhod9MET+rV43yO6ztABI7lC+sv2q0PFbPx6srvqqy F16wsir3CIa0rH6TFXW3qmQCEAZxjIG4GLnwc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=GcMnjuke/WDDK9KhYcS2xUXa0sTVT3xT14OmjkG9q8k=; b=dAcNgMaK/hJ5vwCTqEXjcbiKKqfCDoE+Z1POBXWnljiRKw6x90WDSO1nHFYU8vcqbr ExYhtoQ4XNBa26wqavv/ah6J4EpSPA2SBjF+cA8kvfRrVO4nrZH1lRbVU9CVZFvsn1ji 3e8Q5rVf0Q3uWUx5P4b5f6nHReF8D8eimRv/tIGw4xgnSS3M+hf0NGtqa7pszQG47FB4 oQp0pC1+aYWGxitdegdQbbIT41DZQh4XTuYQLsQwzMi7sY0wh0hd1bnUiC6UwnN1mFWH HG36xtKmouTKTJV/UAitpJVYv/QONjd9HSePaYeQoxW+CJ0oZnwTKEutvlZOrPiybZOX Y3Zw== X-Gm-Message-State: AKGB3mLI9C98Hh8Qylib8tk0BIN8+4KAju2vREHlM8K1Cy2BgWZdGnOo FGo6aOdxZaMSPxS4PADFqyFYxsEOwy4= X-Received: by 10.101.96.213 with SMTP id r21mr322215pgv.395.1513617486960; Mon, 18 Dec 2017 09:18:06 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.05 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:06 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:36 -0800 Message-Id: <20171218171758.16964-5-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::241 Subject: [Qemu-devel] [PATCH v7 04/26] target/arm: Align vector registers X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Alex Bennée Signed-off-by: Richard Henderson --- target/arm/cpu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.14.3 Reviewed-by: Philippe Mathieu-Daudé diff --git a/target/arm/cpu.h b/target/arm/cpu.h index 96316700dd..3ff4dea6b8 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -492,7 +492,7 @@ typedef struct CPUARMState { * the two execution states, and means we do not need to explicitly * map these registers when changing states. */ - float64 regs[64]; + float64 regs[64] QEMU_ALIGNED(16); uint32_t xregs[16]; /* We store these fpcsr fields separately for convenience. */ From patchwork Mon Dec 18 17:17:37 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122243 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3099795qgn; Mon, 18 Dec 2017 09:18:56 -0800 (PST) X-Google-Smtp-Source: ACJfBos64RyZMFMqL96M199XtuIYfA+W5EplSRcWcF0LregAdKPZcCe1sP3yyUuFqhbzC1KNoG4x X-Received: by 10.13.224.5 with SMTP id j5mr385648ywe.326.1513617536621; Mon, 18 Dec 2017 09:18:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513617536; cv=none; d=google.com; s=arc-20160816; b=i2N0MpYNhW8sgLVjbrYQmGK4Q0XwuZ+Dt+j+C8i3CjXsjZsPctsomxEvj7Wmbumwd3 R9B3+zykXGlVzmvqS5Z+oiuhJhAJ0DZlMmKMKJyfOlQHCWMXXHghHAkLN2HdHb14H1XR XW8zb+0w09y92PssB69/EEnTMMzXIBEVbraATb/xgELD+qjEWV4AmpL8Flk/1Ubg2XdB sppbbn+OlVafsx+52Yj1XyQK0HhnqWrjLMhjB9u0XLGG3dORc/4u5S0e0GDndH6qFNYJ UsVHgPafmXagQnyRYHmJapyRfmOU9Q5EsGzjXwXKq1FqtJiFWbxVl7a2uln7N03GQaAV eLcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=Ua176QlergBBCigQAYx4Bh/qGHxv4J7O/KNvAEHjYFo=; b=Ux4HFPQpSBdq75/nUSBrQN04BapaZ3wq/tM9yZP6yiJzFcNtkid3SatGBFhEuB771H TJIqpvarNL2mOFBq+EAJbq996Q5SNIZ2ByHt3RZW9sxyJ07LR+w1Jeiqzor5LtI8vQdh r0fhesizkRWZcBd/BqFc4ybZFgd+pGaKSfNf+Kw5jjoUKbEeR5q2D7xv95IjTvjma0CA 36ptVY/gUYi40xHOQ023KZ9LcN9vjdwcFpgM08khFRwMSoNQ/WExzcW9yogt+8hIlaOE R76w6CXjIBoNaugFUyam+a17WyTG/LOhVGOfjAzbdiyyFrnwy2JkXTPRVRcaY95bHVL3 5QYA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=O5ppDnBv; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id h74si2491298ywc.302.2017.12.18.09.18.56 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:18:56 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=O5ppDnBv; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58430 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz43-0000R6-UW for patch@linaro.org; Mon, 18 Dec 2017 12:18:56 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36877) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3R-0000Ma-CV for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:22 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3K-0000rI-Dl for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:17 -0500 Received: from mail-pl0-x241.google.com ([2607:f8b0:400e:c01::241]:36438) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3K-0000pf-5C for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:10 -0500 Received: by mail-pl0-x241.google.com with SMTP id b12so5185860plm.3 for ; Mon, 18 Dec 2017 09:18:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Ua176QlergBBCigQAYx4Bh/qGHxv4J7O/KNvAEHjYFo=; b=O5ppDnBvZUe6BcM5vyYtwLhxShppBIw7Dou4Dusr/4eQ5x118FzpyK4EE6QMT+MgqO 9enRTbtqvswftruJRJZmDM5R/eNMh1NvpXz5H9DNQ1Tf8oFCiJR/Q71Cn2pENPryGs1e QW5Kau8o0OZi9BpLZ+KfIm5uUb/bQBiU3GMs8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Ua176QlergBBCigQAYx4Bh/qGHxv4J7O/KNvAEHjYFo=; b=Fuaio1vkiAWoytISWUO8b/lBHRD7MmR1AgN8ZYNRVN3jxbXFtGCjYB/IPzCsXpe0lw e5ajU4jNVleLqk5LHedDnP9LuVfc21fg9EAw71XPAEN5L05TUojRyh4B7gL+uR8to8Rj ilkdeGLGGsVRj8lnKoZC/uzWfe920LhT+TvX9lTNiv++vRpljmQ/2Rb4n6hAmMq2y9kG If5aifU6uFs6PnBFdRqABkUuRzwS2ODMr4EgVfAcSGKxmW5GCT1Zp8YvxhVdGZkoCKXi N//MdL8tLj1EoEJW+fI8F5wxMvKk+a0OoZnqaJzZfM+qvnEses3TZeO7SsiVcPgu/TvZ 7org== X-Gm-Message-State: AKGB3mKF42sD1q2bJIiOT/xnViK5TrLLL7xv6lDyAGkUUW5pLk20eigW xRqKZbg35DQypmDjaGKBxkNrsWct34E= X-Received: by 10.159.233.135 with SMTP id bh7mr424304plb.52.1513617488730; Mon, 18 Dec 2017 09:18:08 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.07 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:07 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:37 -0800 Message-Id: <20171218171758.16964-6-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::241 Subject: [Qemu-devel] [PATCH v7 05/26] target/arm: Use vector infrastructure for aa64 add/sub/logic X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Alex Bennée Signed-off-by: Richard Henderson --- target/arm/translate-a64.c | 207 +++++++++++++++++++++++++++++---------------- 1 file changed, 134 insertions(+), 73 deletions(-) -- 2.14.3 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index ba94f7d045..572af456d1 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -21,6 +21,7 @@ #include "cpu.h" #include "exec/exec-all.h" #include "tcg-op.h" +#include "tcg-op-gvec.h" #include "qemu/log.h" #include "arm_ldst.h" #include "translate.h" @@ -83,6 +84,10 @@ typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64); typedef void CryptoTwoOpEnvFn(TCGv_ptr, TCGv_i32, TCGv_i32); typedef void CryptoThreeOpEnvFn(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32); +/* Note that the gvec expanders operate on offsets + sizes. */ +typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t, + uint32_t, uint32_t, uint32_t); + /* initialize TCG globals. */ void a64_translate_init(void) { @@ -535,6 +540,21 @@ static inline int vec_reg_offset(DisasContext *s, int regno, return offs; } +/* Return the offset info CPUARMState of the "whole" vector register Qn. */ +static inline int vec_full_reg_offset(DisasContext *s, int regno) +{ + assert_fp_access_checked(s); + return offsetof(CPUARMState, vfp.regs[regno * 2]); +} + +/* Return the byte size of the "whole" vector register, VL / 8. */ +static inline int vec_full_reg_size(DisasContext *s) +{ + /* FIXME SVE: We should put the composite ZCR_EL* value into tb->flags. + In the meantime this is just the AdvSIMD length of 128. */ + return 128 / 8; +} + /* Return the offset into CPUARMState of a slice (from * the least significant end) of FP register Qn (ie * Dn, Sn, Hn or Bn). @@ -9048,85 +9068,125 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn) } } +static void gen_bsl_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm) +{ + tcg_gen_xor_i64(rn, rn, rm); + tcg_gen_and_i64(rn, rn, rd); + tcg_gen_xor_i64(rd, rm, rn); +} + +static void gen_bit_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm) +{ + tcg_gen_xor_i64(rn, rn, rd); + tcg_gen_and_i64(rn, rn, rm); + tcg_gen_xor_i64(rd, rd, rn); +} + +static void gen_bif_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm) +{ + tcg_gen_xor_i64(rn, rn, rd); + tcg_gen_andc_i64(rn, rn, rm); + tcg_gen_xor_i64(rd, rd, rn); +} + +static void gen_bsl_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm) +{ + tcg_gen_xor_vec(vece, rn, rn, rm); + tcg_gen_and_vec(vece, rn, rn, rd); + tcg_gen_xor_vec(vece, rd, rm, rn); +} + +static void gen_bit_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm) +{ + tcg_gen_xor_vec(vece, rn, rn, rd); + tcg_gen_and_vec(vece, rn, rn, rm); + tcg_gen_xor_vec(vece, rd, rd, rn); +} + +static void gen_bif_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm) +{ + tcg_gen_xor_vec(vece, rn, rn, rd); + tcg_gen_andc_vec(vece, rn, rn, rm); + tcg_gen_xor_vec(vece, rd, rd, rn); +} + /* Logic op (opcode == 3) subgroup of C3.6.16. */ static void disas_simd_3same_logic(DisasContext *s, uint32_t insn) { + static const GVecGen3 bsl_op = { + .fni8 = gen_bsl_i64, + .fniv = gen_bsl_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true + }; + static const GVecGen3 bit_op = { + .fni8 = gen_bit_i64, + .fniv = gen_bit_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true + }; + static const GVecGen3 bif_op = { + .fni8 = gen_bif_i64, + .fniv = gen_bif_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true + }; + int rd = extract32(insn, 0, 5); int rn = extract32(insn, 5, 5); int rm = extract32(insn, 16, 5); int size = extract32(insn, 22, 2); bool is_u = extract32(insn, 29, 1); bool is_q = extract32(insn, 30, 1); - TCGv_i64 tcg_op1, tcg_op2, tcg_res[2]; - int pass; + GVecGen3Fn *gvec_fn; + const GVecGen3 *gvec_op; if (!fp_access_check(s)) { return; } - tcg_op1 = tcg_temp_new_i64(); - tcg_op2 = tcg_temp_new_i64(); - tcg_res[0] = tcg_temp_new_i64(); - tcg_res[1] = tcg_temp_new_i64(); - - for (pass = 0; pass < (is_q ? 2 : 1); pass++) { - read_vec_element(s, tcg_op1, rn, pass, MO_64); - read_vec_element(s, tcg_op2, rm, pass, MO_64); - - if (!is_u) { - switch (size) { - case 0: /* AND */ - tcg_gen_and_i64(tcg_res[pass], tcg_op1, tcg_op2); - break; - case 1: /* BIC */ - tcg_gen_andc_i64(tcg_res[pass], tcg_op1, tcg_op2); - break; - case 2: /* ORR */ - tcg_gen_or_i64(tcg_res[pass], tcg_op1, tcg_op2); - break; - case 3: /* ORN */ - tcg_gen_orc_i64(tcg_res[pass], tcg_op1, tcg_op2); - break; - } - } else { - if (size != 0) { - /* B* ops need res loaded to operate on */ - read_vec_element(s, tcg_res[pass], rd, pass, MO_64); - } - - switch (size) { - case 0: /* EOR */ - tcg_gen_xor_i64(tcg_res[pass], tcg_op1, tcg_op2); - break; - case 1: /* BSL bitwise select */ - tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_op2); - tcg_gen_and_i64(tcg_op1, tcg_op1, tcg_res[pass]); - tcg_gen_xor_i64(tcg_res[pass], tcg_op2, tcg_op1); - break; - case 2: /* BIT, bitwise insert if true */ - tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_res[pass]); - tcg_gen_and_i64(tcg_op1, tcg_op1, tcg_op2); - tcg_gen_xor_i64(tcg_res[pass], tcg_res[pass], tcg_op1); - break; - case 3: /* BIF, bitwise insert if false */ - tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_res[pass]); - tcg_gen_andc_i64(tcg_op1, tcg_op1, tcg_op2); - tcg_gen_xor_i64(tcg_res[pass], tcg_res[pass], tcg_op1); - break; - } - } - } + switch (size + 4 * is_u) { + case 0: /* AND */ + gvec_fn = tcg_gen_gvec_and; + goto do_fn; + case 1: /* BIC */ + gvec_fn = tcg_gen_gvec_andc; + goto do_fn; + case 2: /* ORR */ + gvec_fn = tcg_gen_gvec_or; + goto do_fn; + case 3: /* ORN */ + gvec_fn = tcg_gen_gvec_orc; + goto do_fn; + case 4: /* EOR */ + gvec_fn = tcg_gen_gvec_xor; + goto do_fn; + do_fn: + gvec_fn(0, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s)); + return; + + case 5: /* BSL bitwise select */ + gvec_op = &bsl_op; + goto do_op; + case 6: /* BIT, bitwise insert if true */ + gvec_op = &bit_op; + goto do_op; + case 7: /* BIF, bitwise insert if false */ + gvec_op = &bif_op; + goto do_op; + do_op: + tcg_gen_gvec_3(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s), gvec_op); + return; - write_vec_element(s, tcg_res[0], rd, 0, MO_64); - if (!is_q) { - tcg_gen_movi_i64(tcg_res[1], 0); + default: + g_assert_not_reached(); } - write_vec_element(s, tcg_res[1], rd, 1, MO_64); - - tcg_temp_free_i64(tcg_op1); - tcg_temp_free_i64(tcg_op2); - tcg_temp_free_i64(tcg_res[0]); - tcg_temp_free_i64(tcg_res[1]); } /* Helper functions for 32 bit comparisons */ @@ -9387,6 +9447,7 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) int rn = extract32(insn, 5, 5); int rd = extract32(insn, 0, 5); int pass; + GVecGen3Fn *gvec_op; switch (opcode) { case 0x13: /* MUL, PMUL */ @@ -9426,6 +9487,16 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) return; } + switch (opcode) { + case 0x10: /* ADD, SUB */ + gvec_op = u ? tcg_gen_gvec_sub : tcg_gen_gvec_add; + gvec_op(size, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s)); + return; + } + if (size == 3) { assert(is_q); for (pass = 0; pass < 2; pass++) { @@ -9598,16 +9669,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) genfn = fns[size][u]; break; } - case 0x10: /* ADD, SUB */ - { - static NeonGenTwoOpFn * const fns[3][2] = { - { gen_helper_neon_add_u8, gen_helper_neon_sub_u8 }, - { gen_helper_neon_add_u16, gen_helper_neon_sub_u16 }, - { tcg_gen_add_i32, tcg_gen_sub_i32 }, - }; - genfn = fns[size][u]; - break; - } case 0x11: /* CMTST, CMEQ */ { static NeonGenTwoOpFn * const fns[3][2] = { From patchwork Mon Dec 18 17:17:38 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122251 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3108933qgn; Mon, 18 Dec 2017 09:27:51 -0800 (PST) X-Google-Smtp-Source: ACJfBovip8J4SxhxPdzGsW7EkHy+mIfWVHrYf985nVhboX3Wk02NVPG/SMMgn0+35qkXEw2SOVf+ X-Received: by 10.37.239.69 with SMTP id w5mr448360ybm.31.1513618071136; Mon, 18 Dec 2017 09:27:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513618071; cv=none; d=google.com; s=arc-20160816; b=kc200ONFsNvtpWYmD+1t7QP1qDrTqzXHDQ9K3m6k29sL7iiJZ33905juVoHUUGfeku 57s4x1rwPyEhG+xNPWHDmKaLuuQrjLpJauPt2PDOJtHHDOaZOHu+/YXPIPy+FGz0rLpV EI4uW4uQ/E2LbJCm/4w4zXUX6SfiyM/TskxY2rs4Xe7uj4HmVvuAB0e4w+CHvoY79YoP Wz5m26QHOI/IqUHjn8TFgN4jeiWxYne/SJbiUPqyMs8bjdRZiicIRjfa2ggsPPHDkUoY CyVbZiIBvzNl8oBRsr4IrtnMH/qD5nio7dXDTYI9sqrYDsacClbJ0Q5TV5cFeG022xOD 3jnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=acxGEq2fNn/6enx58EagUxLLjzbVqjr4Z7gQAnNLh/k=; b=wnXcYFr5v0xp5iUCWMFSLq+5spcuGnLxBu2sWjgKFaW7NMsDXHWtwH84fg1ry7z0ji fR3YKq/sviVWcOr4ZGOb0irzkBLnrnr/RLen2KF6hemqoqfByUTWZUeSYONgCmqMLY7j o2X0stENxM47Z/39ZNbTUIK0AKO5r/DBtSxSSXHuZsJKUFH3CvFNq8JfKdLmS5Lty+M5 FwnH85X0vwLUuNXwfKHUHDJ+vHlBXd3+lsGUX5I5RS6qDhBacNEQVhSTg7qhpTT7l3TU nDW/GRUXXRWjqJf+Lj9qawlGYZHUh8xdUTjrtY0k7TIpV3VJWYbHCBWmpqIXqp64nDka 5Xsw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=dAbs6SAN; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id y4si2491512ybj.405.2017.12.18.09.27.51 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:27:51 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=dAbs6SAN; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58577 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzCg-0007XA-HS for patch@linaro.org; Mon, 18 Dec 2017 12:27:50 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36885) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3R-0000Mh-Hp for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:22 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3L-0000tp-RF for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:17 -0500 Received: from mail-pl0-x244.google.com ([2607:f8b0:400e:c01::244]:34425) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3L-0000sH-Gx for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:11 -0500 Received: by mail-pl0-x244.google.com with SMTP id d21so5189300pll.1 for ; Mon, 18 Dec 2017 09:18:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=acxGEq2fNn/6enx58EagUxLLjzbVqjr4Z7gQAnNLh/k=; b=dAbs6SANMDf82UMcx1CQvLrlSANGDNPX+KbD8UMGYM6rDsKYQBVE0zYRh8OrlX1Mak yMXwKlAL4t2yPGH7hZ4fHA8h5/ZmHgZc6CcDoJyS6gdIem0MArSpUQhPMSQmOxHmqU4k HLcui8wi6e9a0IGVVZ9y5y6MM2/qfb64o0lLg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=acxGEq2fNn/6enx58EagUxLLjzbVqjr4Z7gQAnNLh/k=; b=UOq4xqnyg7jyt1kwMevr7S0ccpuI/Aq2hgfjUMWvOYBffSsGdIhiuGF+axaB2tzkqs 3tEpYefJ2pcEOhh9N8lvy2VpoywmvzOvTwAfwy1OXNtCZ7fimlKSw4GFleafB5oR1Gh3 3dXhZ2JTUjo1vmUbSkggI8QD6hTxKHQmzjC/kom7f2WpRHEjNYPcbOD+ToM9iqgQxFhx 15tmB9EZdgPKxDtaoO6KIyhKhhjXukyBL9acAUnhzgejF7K33eMzY0iVkNCgXPaLfHQC TzsUd/PIJaatlE0GiusebNWtyU3gkZ6P+qgg+VagrijU2SyjjgSymydt/RbeejXdgiER wzOw== X-Gm-Message-State: AKGB3mKtBC22eYf+nirVfKs+qFgU78K9NKt8E3Mvy38dU+mVaM5Z2nIE vFOynSHccUnoVhUUFX5K2gVkdaeLnp0= X-Received: by 10.84.224.75 with SMTP id a11mr369900plt.421.1513617490188; Mon, 18 Dec 2017 09:18:10 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.08 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:09 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:38 -0800 Message-Id: <20171218171758.16964-7-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::244 Subject: [Qemu-devel] [PATCH v7 06/26] target/arm: Use vector infrastructure for aa64 mov/not/neg X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- target/arm/translate-a64.c | 43 ++++++++++++++++++++++++++++++++++++++----- 1 file changed, 38 insertions(+), 5 deletions(-) -- 2.14.3 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 572af456d1..bc14c28e71 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -85,6 +85,7 @@ typedef void CryptoTwoOpEnvFn(TCGv_ptr, TCGv_i32, TCGv_i32); typedef void CryptoThreeOpEnvFn(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32); /* Note that the gvec expanders operate on offsets + sizes. */ +typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t); typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t); @@ -4579,14 +4580,19 @@ static void handle_fp_1src_double(DisasContext *s, int opcode, int rd, int rn) TCGv_i64 tcg_op; TCGv_i64 tcg_res; + switch (opcode) { + case 0x0: /* FMOV */ + tcg_gen_gvec_mov(0, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + 8, vec_full_reg_size(s)); + return; + } + fpst = get_fpstatus_ptr(); tcg_op = read_fp_dreg(s, rn); tcg_res = tcg_temp_new_i64(); switch (opcode) { - case 0x0: /* FMOV */ - tcg_gen_mov_i64(tcg_res, tcg_op); - break; case 0x1: /* FABS */ gen_helper_vfp_absd(tcg_res, tcg_op); break; @@ -9153,6 +9159,12 @@ static void disas_simd_3same_logic(DisasContext *s, uint32_t insn) gvec_fn = tcg_gen_gvec_andc; goto do_fn; case 2: /* ORR */ + if (rn == rm) { /* MOV */ + tcg_gen_gvec_mov(0, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + is_q ? 16 : 8, vec_full_reg_size(s)); + return; + } gvec_fn = tcg_gen_gvec_or; goto do_fn; case 3: /* ORN */ @@ -10032,6 +10044,7 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) int rmode = -1; TCGv_i32 tcg_rmode; TCGv_ptr tcg_fpstatus; + GVecGen2Fn *gvec_fn; switch (opcode) { case 0x0: /* REV64, REV32 */ @@ -10040,8 +10053,7 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) return; case 0x5: /* CNT, NOT, RBIT */ if (u && size == 0) { - /* NOT: adjust size so we can use the 64-bits-at-a-time loop. */ - size = 3; + /* NOT */ break; } else if (u && size == 1) { /* RBIT */ @@ -10293,6 +10305,27 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) tcg_rmode = NULL; } + switch (opcode) { + case 0x5: + if (u && size == 0) { /* NOT */ + gvec_fn = tcg_gen_gvec_not; + goto do_fn; + } + break; + case 0xb: + if (u) { /* NEG */ + gvec_fn = tcg_gen_gvec_neg; + goto do_fn; + } + break; + + do_fn: + gvec_fn(size, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + is_q ? 16 : 8, vec_full_reg_size(s)); + return; + } + if (size == 3) { /* All 64-bit element operations can be shared with scalar 2misc */ int pass; From patchwork Mon Dec 18 17:17:39 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122244 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3099815qgn; Mon, 18 Dec 2017 09:18:58 -0800 (PST) X-Google-Smtp-Source: ACJfBouLqr5FQRUtLryvxKjsENRa2AcNv0rf/7xtzZi7gsv9D8f9RGyCJb91ZC1T8dOnxYylJuqD X-Received: by 10.37.52.130 with SMTP id b124mr442586yba.54.1513617538522; Mon, 18 Dec 2017 09:18:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513617538; cv=none; d=google.com; s=arc-20160816; b=UF86t552QORo/pKsl0mGZaSm19e7qJWvZU0bKau0SzHHvl4lSOgF8JeDFCt20enxVQ P1i5E+9AgV4hVO0Qc+Q2pi3B8BFU903+4heuBpDA3Ot+VisKq/sL/HlbaO0XkNSO/A9d pebVwR4q/c+ik5susNPCg+c5c0ivkdfXJSbNKfiWzdJw1hcy+GOKdzl/szOrr190c0y2 DGrS9rlhVk2xFtXJimZFZf6itElP/w8EF7kh6Jul5DAWbZf5FaGX7StELQglQwdXAVvZ badLi+uiP+VwTwSZetcHHsapCrdWXOit2z3zl1GoGAK9W5NcbzoLb9omSamer0goeORK Jj/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=okhsN7PvAI2tSwB7zlFXSFL3pRM/854PTOSz2h1XJ54=; b=XFMaiEo8s3P+o+XbkLw5kkimc7PW6K4l9VERBDddSoa9tCMjUi5aEFNpU5gdbIQait g8rOe1s7cmNgWS+SPr4XusUZYT8DFopZD9zwFn2IOUlOClse85lOboPxm8FhlsS85zO1 f3VHQ3USEtJfKyvYyzJYi0P3HQqSc20gdLBbFtB9GhnTaQcXlzKwHjp/xZAeIAItK7DQ 7J2mJZelp8VlzAj1aXjxHW0S1RCgzwzxe/zVsLUqdIzMDZJQ/9QIR34+KeteJl+dbXtQ ZXM7DKogy+SUJANHIYSPnoJadRVtjpIG9fse6UppkEJY27EhiWb0rU5d6/rKEap6z5ua 4s0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=bn+hPXnC; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id a12si2536912ybl.107.2017.12.18.09.18.58 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:18:58 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=bn+hPXnC; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58431 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz45-0000S9-U5 for patch@linaro.org; Mon, 18 Dec 2017 12:18:58 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36919) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3S-0000NX-Fi for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:23 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3N-0000vw-6N for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:18 -0500 Received: from mail-pl0-x244.google.com ([2607:f8b0:400e:c01::244]:35728) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3M-0000ug-Qh for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:13 -0500 Received: by mail-pl0-x244.google.com with SMTP id b96so5198796pli.2 for ; Mon, 18 Dec 2017 09:18:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=okhsN7PvAI2tSwB7zlFXSFL3pRM/854PTOSz2h1XJ54=; b=bn+hPXnCdTK6URW8RJxHaQ+DMON9ue+1qjIezTmLBvzRdhfIskqSCKx/LPavGVZNxU JfwK2jWyc0WPghkdTBOm64LxztWpQisSG71AS+2hfdY5NxYSi/BSCTw8WgWRZcPJz7/8 w1vj+tJv4xHCXH9Ma+RtaTI/aVw1/p2Jy8VPs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=okhsN7PvAI2tSwB7zlFXSFL3pRM/854PTOSz2h1XJ54=; b=fAsgWh6Lr5etqWh5HIKzKA9b3776ObUJcaPCN/bD4CvCtls6OeyLvymrAAqOu97afL uXnLnxFjoLc+5/mhuI5TZoNTezAN56Shiy2TyCcpFHClhABEWDDtk2sPJcLtka4cr81Q rYJ5pIQ5yqjFrFc38EKkfclU2Kw0i8xbLWcj0l/cvtS+56TSHVSYEZjrOVW5rUst4Xvh gELTXPNPknLhPsDJvJoYaNcVl6w/CuwwelaVk2e074zIIKr/z0AfbEaV6hw/Hn335Xg5 LrHAadfHwFdA3ZYhpo4jW7FS4ws4VlTzYIHROchr70TWyFZpLnG7HHhqE2dgpETZE1Pu rkGA== X-Gm-Message-State: AKGB3mIQuLZoOuhMdYEK0lFqh3ij2y2HiTGFRW55QAQSZmsmdmyPo+G+ szVQQlGEM9UldAc5kx7tP+rJxyenjFY= X-Received: by 10.84.241.67 with SMTP id u3mr389962plm.275.1513617491587; Mon, 18 Dec 2017 09:18:11 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.10 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:10 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:39 -0800 Message-Id: <20171218171758.16964-8-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::244 Subject: [Qemu-devel] [PATCH v7 07/26] target/arm: Use vector infrastructure for aa64 dup/movi X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- target/arm/translate-a64.c | 83 +++++++++++++++++++--------------------------- 1 file changed, 34 insertions(+), 49 deletions(-) -- 2.14.3 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index bc14c28e71..55a4902fc2 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -5846,38 +5846,24 @@ static void disas_simd_across_lanes(DisasContext *s, uint32_t insn) * * size: encoded in imm5 (see ARM ARM LowestSetBit()) */ + static void handle_simd_dupe(DisasContext *s, int is_q, int rd, int rn, int imm5) { int size = ctz32(imm5); - int esize = 8 << size; - int elements = (is_q ? 128 : 64) / esize; - int index, i; - TCGv_i64 tmp; + int index = imm5 >> (size + 1); if (size > 3 || (size == 3 && !is_q)) { unallocated_encoding(s); return; } - if (!fp_access_check(s)) { return; } - index = imm5 >> (size + 1); - - tmp = tcg_temp_new_i64(); - read_vec_element(s, tmp, rn, index, size); - - for (i = 0; i < elements; i++) { - write_vec_element(s, tmp, rd, i, size); - } - - if (!is_q) { - clear_vec_high(s, rd); - } - - tcg_temp_free_i64(tmp); + tcg_gen_gvec_dup_mem(size, vec_full_reg_offset(s, rd), + vec_reg_offset(s, rn, index, size), + is_q ? 16 : 8, vec_full_reg_size(s)); } /* DUP (element, scalar) @@ -5926,9 +5912,7 @@ static void handle_simd_dupg(DisasContext *s, int is_q, int rd, int rn, int imm5) { int size = ctz32(imm5); - int esize = 8 << size; - int elements = (is_q ? 128 : 64)/esize; - int i = 0; + uint32_t dofs, oprsz, maxsz; if (size > 3 || ((size == 3) && !is_q)) { unallocated_encoding(s); @@ -5939,12 +5923,11 @@ static void handle_simd_dupg(DisasContext *s, int is_q, int rd, int rn, return; } - for (i = 0; i < elements; i++) { - write_vec_element(s, cpu_reg(s, rn), rd, i, size); - } - if (!is_q) { - clear_vec_high(s, rd); - } + dofs = vec_full_reg_offset(s, rd); + oprsz = is_q ? 16 : 8; + maxsz = vec_full_reg_size(s); + + tcg_gen_gvec_dup_i64(size, dofs, oprsz, maxsz, cpu_reg(s, rn)); } /* INS (Element) @@ -6135,7 +6118,6 @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn) bool is_neg = extract32(insn, 29, 1); bool is_q = extract32(insn, 30, 1); uint64_t imm = 0; - TCGv_i64 tcg_rd, tcg_imm; int i; if (o2 != 0 || ((cmode == 0xf) && is_neg && !is_q)) { @@ -6217,32 +6199,35 @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn) imm = ~imm; } - tcg_imm = tcg_const_i64(imm); - tcg_rd = new_tmp_a64(s); + if (!((cmode & 0x9) == 0x1 || (cmode & 0xd) == 0x9)) { + /* MOVI or MVNI, with MVNI negation handled above. */ + tcg_gen_gvec_dup64i(vec_full_reg_offset(s, rd), is_q ? 16 : 8, + vec_full_reg_size(s), imm); + } else { + TCGv_i64 tcg_imm = tcg_const_i64(imm); + TCGv_i64 tcg_rd = new_tmp_a64(s); - for (i = 0; i < 2; i++) { - int foffs = i ? fp_reg_hi_offset(s, rd) : fp_reg_offset(s, rd, MO_64); + for (i = 0; i < 2; i++) { + int foffs = vec_reg_offset(s, rd, i, MO_64); - if (i == 1 && !is_q) { - /* non-quad ops clear high half of vector */ - tcg_gen_movi_i64(tcg_rd, 0); - } else if ((cmode & 0x9) == 0x1 || (cmode & 0xd) == 0x9) { - tcg_gen_ld_i64(tcg_rd, cpu_env, foffs); - if (is_neg) { - /* AND (BIC) */ - tcg_gen_and_i64(tcg_rd, tcg_rd, tcg_imm); + if (i == 1 && !is_q) { + /* non-quad ops clear high half of vector */ + tcg_gen_movi_i64(tcg_rd, 0); } else { - /* ORR */ - tcg_gen_or_i64(tcg_rd, tcg_rd, tcg_imm); + tcg_gen_ld_i64(tcg_rd, cpu_env, foffs); + if (is_neg) { + /* AND (BIC) */ + tcg_gen_and_i64(tcg_rd, tcg_rd, tcg_imm); + } else { + /* ORR */ + tcg_gen_or_i64(tcg_rd, tcg_rd, tcg_imm); + } } - } else { - /* MOVI */ - tcg_gen_mov_i64(tcg_rd, tcg_imm); + tcg_gen_st_i64(tcg_rd, cpu_env, foffs); } - tcg_gen_st_i64(tcg_rd, cpu_env, foffs); - } - tcg_temp_free_i64(tcg_imm); + tcg_temp_free_i64(tcg_imm); + } } /* AdvSIMD scalar copy From patchwork Mon Dec 18 17:17:40 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122253 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3109625qgn; Mon, 18 Dec 2017 09:28:34 -0800 (PST) X-Google-Smtp-Source: ACJfBosVmmBbFOeME1AAxz+U1M8/ewEJBWcjvzp4xkEy7YUEpw+1J6mQJIQ94L8ZRoGBBH0lUZ4y X-Received: by 10.129.163.144 with SMTP id a138mr406074ywh.238.1513618114634; Mon, 18 Dec 2017 09:28:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513618114; cv=none; d=google.com; s=arc-20160816; b=urUBJkhVMgD1GQVov6djN+xVU8tu/MuMmKeWurz6By4LufTN5uuMTVWRmfiTPoSUwy b9IXbvntUQ6pVbYg3bD5Ua2Xq57iT045+UFx7aBpDQnmBNkDizvFNkmXLUFFI2A+/+Py 4ZTZkE7MAzy4lKX6K61ft0Qb46h2vSkvsCAtHf6i/gNxKmbXwKKosyO85eOQJs31to0T L20kFJmWLRwCVSJpXTwcED7ibHbRXGmhQ7bk+IQ3wLlZpKTRYs0nx7mudmIQndSaxbie EG4KypeYaZ/cVfV6fRiP2w73A7DjOmeFi5MSe5lznIHZNc8qhugoSbiRXGFoRMODuyDX XRqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=LTdMxHaw47E9TYxQrq6VbKex/Iqfbmbeckq8Soi7/4c=; b=JbIXqJo0BaxgAUzXiGVj/R9Ar0ULxkPGdKuI4Md1czV4SoPYuB/I8fqVHBjd/91jkc fvn2zPSIA/MdhtECYOMkbfPMgWD+84HS0d3aSJJaj7zUP+czohqGOVOycvB9p1FNhFLo BdsjKJwYSK4BSplla76K7mSkpXOcQTQB5qIOrY2R52xFcaK06ei+iAMH9t3U7TWVqdZk J01HuWLBxYxmOWFJMqE45xo/qtEyTIlayENsOdocY5GyaYaBj6mMPNvRlatVNceK5XvZ ETtQ/rE40sr0x5nRWrpllCtFpkYaS83vh9MsMfnyn7ub4H/oey7CB6f7SUb2RhBxxmDQ hw+Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=D2qoIDjW; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id o186si2510917ybo.525.2017.12.18.09.28.34 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:28:34 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=D2qoIDjW; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58592 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzDN-0000Zx-VA for patch@linaro.org; Mon, 18 Dec 2017 12:28:34 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37147) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3Y-0000RL-O3 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:32 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3P-0000zl-RA for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:24 -0500 Received: from mail-pl0-x243.google.com ([2607:f8b0:400e:c01::243]:43230) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3P-0000yE-9Y for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:15 -0500 Received: by mail-pl0-x243.google.com with SMTP id z5so5199011plo.10 for ; Mon, 18 Dec 2017 09:18:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=LTdMxHaw47E9TYxQrq6VbKex/Iqfbmbeckq8Soi7/4c=; b=D2qoIDjWMVa0XTJFbv9Fnz5Ubsv52+6so4WvT8lApX0baIXh9GfH7SGIowdK1gUfa0 re/YiqdOI58S0nZACo3HAPpAbEJbEqjo1M4X7Kl1OIYCgIRGrEZi4wZmtznPeooupGR2 8TPZ+sN9xm9rtMsNDSfQztOs4g3q+a7774RrQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=LTdMxHaw47E9TYxQrq6VbKex/Iqfbmbeckq8Soi7/4c=; b=AI0T7ksDenRTuaQmp2QWEZE97faIJmP/JeM1B2BxwBLY0sFOKm4JwxMNMuKpz+ikZ/ YJxU5XZLapphW61ZsHJWJ3PCEB4NZhs23Zc28jzaYDHco+AWuAoKeTgWP5semglsCaPX 11bJmkdzPDVmpMR2tiycfLdmKYuhDJkbQSlDp0MaPmVT0PFNxZyZtr5uRthMQP9gr1J8 cWh5V/HC4eB1B7ImXyPpDLzR6IT3/6LuyY9fCExq8FnG0is4xqd+OBz8CXL0CHxQOGvN 8mVq+ExIGTKBq3baSC5DBXVy7DxarQwk2M/oucMEnVqa02LWyK1fgp0teAx2B9mijCEC uFKg== X-Gm-Message-State: AKGB3mI3GcGoNzByM1oLa3MJREy1GYtHQ8Jb5VbxsW6+c850GnPzvPd7 RR3KK4QQtnVnbt4Mkv61YPla3g2a5W4= X-Received: by 10.159.241.133 with SMTP id s5mr424172plr.40.1513617493354; Mon, 18 Dec 2017 09:18:13 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.11 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:12 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:40 -0800 Message-Id: <20171218171758.16964-9-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::243 Subject: [Qemu-devel] [PATCH v7 08/26] tcg/i386: Add vector operations X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" The x86 vector instruction set is extremely irregular. With newer editions, Intel has filled in some of the blanks. However, we don't get many 64-bit operations until SSE4.2, introduced in 2009. The subsequent edition was for AVX1, introduced in 2011, which added three-operand addressing, and adjusts how all instructions should be encoded. Given the relatively narrow 2 year window between possible to support and desirable to support, and to vastly simplify code maintainence, I am only planning to support AVX1 and later cpus. Signed-off-by: Richard Henderson --- tcg/i386/tcg-target.h | 36 ++- tcg/i386/tcg-target.inc.c | 561 ++++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 546 insertions(+), 51 deletions(-) -- 2.14.3 diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h index b89dababf4..f9d3fc4a93 100644 --- a/tcg/i386/tcg-target.h +++ b/tcg/i386/tcg-target.h @@ -30,10 +30,10 @@ #ifdef __x86_64__ # define TCG_TARGET_REG_BITS 64 -# define TCG_TARGET_NB_REGS 16 +# define TCG_TARGET_NB_REGS 32 #else # define TCG_TARGET_REG_BITS 32 -# define TCG_TARGET_NB_REGS 8 +# define TCG_TARGET_NB_REGS 24 #endif typedef enum { @@ -56,6 +56,26 @@ typedef enum { TCG_REG_R13, TCG_REG_R14, TCG_REG_R15, + + TCG_REG_XMM0, + TCG_REG_XMM1, + TCG_REG_XMM2, + TCG_REG_XMM3, + TCG_REG_XMM4, + TCG_REG_XMM5, + TCG_REG_XMM6, + TCG_REG_XMM7, + + /* 64-bit registers; likewise always define. */ + TCG_REG_XMM8, + TCG_REG_XMM9, + TCG_REG_XMM10, + TCG_REG_XMM11, + TCG_REG_XMM12, + TCG_REG_XMM13, + TCG_REG_XMM14, + TCG_REG_XMM15, + TCG_REG_RAX = TCG_REG_EAX, TCG_REG_RCX = TCG_REG_ECX, TCG_REG_RDX = TCG_REG_EDX, @@ -77,6 +97,8 @@ typedef enum { extern bool have_bmi1; extern bool have_popcnt; +extern bool have_avx1; +extern bool have_avx2; /* optional instructions */ #define TCG_TARGET_HAS_div2_i32 1 @@ -146,6 +168,16 @@ extern bool have_popcnt; #define TCG_TARGET_HAS_mulsh_i64 0 #endif +/* We do not support older SSE systems, only beginning with AVX1. */ +#define TCG_TARGET_HAS_v64 have_avx1 +#define TCG_TARGET_HAS_v128 have_avx1 +#define TCG_TARGET_HAS_v256 have_avx2 + +#define TCG_TARGET_HAS_andc_vec 1 +#define TCG_TARGET_HAS_orc_vec 0 +#define TCG_TARGET_HAS_not_vec 0 +#define TCG_TARGET_HAS_neg_vec 0 + #define TCG_TARGET_deposit_i32_valid(ofs, len) \ (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \ ((ofs) == 0 && (len) == 16)) diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c index 63d27f10e7..e9a4d92598 100644 --- a/tcg/i386/tcg-target.inc.c +++ b/tcg/i386/tcg-target.inc.c @@ -28,9 +28,14 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = { #if TCG_TARGET_REG_BITS == 64 "%rax", "%rcx", "%rdx", "%rbx", "%rsp", "%rbp", "%rsi", "%rdi", - "%r8", "%r9", "%r10", "%r11", "%r12", "%r13", "%r14", "%r15", #else "%eax", "%ecx", "%edx", "%ebx", "%esp", "%ebp", "%esi", "%edi", +#endif + "%r8", "%r9", "%r10", "%r11", "%r12", "%r13", "%r14", "%r15", + "%xmm0", "%xmm1", "%xmm2", "%xmm3", "%xmm4", "%xmm5", "%xmm6", "%xmm7", +#if TCG_TARGET_REG_BITS == 64 + "%xmm8", "%xmm9", "%xmm10", "%xmm11", + "%xmm12", "%xmm13", "%xmm14", "%xmm15", #endif }; #endif @@ -60,6 +65,28 @@ static const int tcg_target_reg_alloc_order[] = { TCG_REG_ECX, TCG_REG_EDX, TCG_REG_EAX, +#endif + TCG_REG_XMM0, + TCG_REG_XMM1, + TCG_REG_XMM2, + TCG_REG_XMM3, + TCG_REG_XMM4, + TCG_REG_XMM5, +#ifndef _WIN64 + /* The Win64 ABI has xmm6-xmm15 as caller-saves, and we do not save + any of them. Therefore only allow xmm0-xmm5 to be allocated. */ + TCG_REG_XMM6, + TCG_REG_XMM7, +#if TCG_TARGET_REG_BITS == 64 + TCG_REG_XMM8, + TCG_REG_XMM9, + TCG_REG_XMM10, + TCG_REG_XMM11, + TCG_REG_XMM12, + TCG_REG_XMM13, + TCG_REG_XMM14, + TCG_REG_XMM15, +#endif #endif }; @@ -94,7 +121,7 @@ static const int tcg_target_call_oarg_regs[] = { #define TCG_CT_CONST_I32 0x400 #define TCG_CT_CONST_WSZ 0x800 -/* Registers used with L constraint, which are the first argument +/* Registers used with L constraint, which are the first argument registers on x86_64, and two random call clobbered registers on i386. */ #if TCG_TARGET_REG_BITS == 64 @@ -125,6 +152,8 @@ static bool have_cmov; it there. Therefore we always define the variable. */ bool have_bmi1; bool have_popcnt; +bool have_avx1; +bool have_avx2; #ifdef CONFIG_CPUID_H static bool have_movbe; @@ -148,6 +177,8 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type, if (value != (int32_t)value) { tcg_abort(); } + /* FALLTHRU */ + case R_386_32: tcg_patch32(code_ptr, value); break; case R_386_PC8: @@ -162,6 +193,14 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type, } } +#if TCG_TARGET_REG_BITS == 64 +#define ALL_GENERAL_REGS 0x0000ffffu +#define ALL_VECTOR_REGS 0xffff0000u +#else +#define ALL_GENERAL_REGS 0x000000ffu +#define ALL_VECTOR_REGS 0x00ff0000u +#endif + /* parse target specific constraints */ static const char *target_parse_constraint(TCGArgConstraint *ct, const char *ct_str, TCGType type) @@ -192,21 +231,29 @@ static const char *target_parse_constraint(TCGArgConstraint *ct, tcg_regset_set_reg(ct->u.regs, TCG_REG_EDI); break; case 'q': + /* A register that can be used as a byte operand. */ ct->ct |= TCG_CT_REG; ct->u.regs = TCG_TARGET_REG_BITS == 64 ? 0xffff : 0xf; break; case 'Q': + /* A register with an addressable second byte (e.g. %ah). */ ct->ct |= TCG_CT_REG; ct->u.regs = 0xf; break; case 'r': + /* A general register. */ ct->ct |= TCG_CT_REG; - ct->u.regs = TCG_TARGET_REG_BITS == 64 ? 0xffff : 0xff; + ct->u.regs |= ALL_GENERAL_REGS; break; case 'W': /* With TZCNT/LZCNT, we can have operand-size as an input. */ ct->ct |= TCG_CT_CONST_WSZ; break; + case 'x': + /* A vector register. */ + ct->ct |= TCG_CT_REG; + ct->u.regs |= ALL_VECTOR_REGS; + break; /* qemu_ld/st address constraint */ case 'L': @@ -277,8 +324,9 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type, # define P_REXB_RM 0 # define P_GS 0 #endif -#define P_SIMDF3 0x10000 /* 0xf3 opcode prefix */ -#define P_SIMDF2 0x20000 /* 0xf2 opcode prefix */ +#define P_SIMDF3 0x20000 /* 0xf3 opcode prefix */ +#define P_SIMDF2 0x40000 /* 0xf2 opcode prefix */ +#define P_VEXL 0x80000 /* Set VEX.L = 1 */ #define OPC_ARITH_EvIz (0x81) #define OPC_ARITH_EvIb (0x83) @@ -310,11 +358,38 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type, #define OPC_MOVL_Iv (0xb8) #define OPC_MOVBE_GyMy (0xf0 | P_EXT38) #define OPC_MOVBE_MyGy (0xf1 | P_EXT38) +#define OPC_MOVD_VyEy (0x6e | P_EXT | P_DATA16) +#define OPC_MOVD_EyVy (0x7e | P_EXT | P_DATA16) +#define OPC_MOVDDUP (0x12 | P_EXT | P_SIMDF2) +#define OPC_MOVDQA_VxWx (0x6f | P_EXT | P_DATA16) +#define OPC_MOVDQA_WxVx (0x7f | P_EXT | P_DATA16) +#define OPC_MOVDQU_VxWx (0x6f | P_EXT | P_SIMDF3) +#define OPC_MOVDQU_WxVx (0x7f | P_EXT | P_SIMDF3) +#define OPC_MOVQ_VqWq (0x7e | P_EXT | P_SIMDF3) +#define OPC_MOVQ_WqVq (0xd6 | P_EXT | P_DATA16) #define OPC_MOVSBL (0xbe | P_EXT) #define OPC_MOVSWL (0xbf | P_EXT) #define OPC_MOVSLQ (0x63 | P_REXW) #define OPC_MOVZBL (0xb6 | P_EXT) #define OPC_MOVZWL (0xb7 | P_EXT) +#define OPC_PADDB (0xfc | P_EXT | P_DATA16) +#define OPC_PADDW (0xfd | P_EXT | P_DATA16) +#define OPC_PADDD (0xfe | P_EXT | P_DATA16) +#define OPC_PADDQ (0xd4 | P_EXT | P_DATA16) +#define OPC_PAND (0xdb | P_EXT | P_DATA16) +#define OPC_PANDN (0xdf | P_EXT | P_DATA16) +#define OPC_PCMPEQB (0x74 | P_EXT | P_DATA16) +#define OPC_POR (0xeb | P_EXT | P_DATA16) +#define OPC_PSHUFD (0x70 | P_EXT | P_DATA16) +#define OPC_PSUBB (0xf8 | P_EXT | P_DATA16) +#define OPC_PSUBW (0xf9 | P_EXT | P_DATA16) +#define OPC_PSUBD (0xfa | P_EXT | P_DATA16) +#define OPC_PSUBQ (0xfb | P_EXT | P_DATA16) +#define OPC_PUNPCKLBW (0x60 | P_EXT | P_DATA16) +#define OPC_PUNPCKLWD (0x61 | P_EXT | P_DATA16) +#define OPC_PUNPCKLDQ (0x62 | P_EXT | P_DATA16) +#define OPC_PUNPCKLQDQ (0x6c | P_EXT | P_DATA16) +#define OPC_PXOR (0xef | P_EXT | P_DATA16) #define OPC_POP_r32 (0x58) #define OPC_POPCNT (0xb8 | P_EXT | P_SIMDF3) #define OPC_PUSH_r32 (0x50) @@ -330,6 +405,11 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type, #define OPC_SHRX (0xf7 | P_EXT38 | P_SIMDF2) #define OPC_TESTL (0x85) #define OPC_TZCNT (0xbc | P_EXT | P_SIMDF3) +#define OPC_VPBROADCASTB (0x78 | P_EXT38 | P_DATA16) +#define OPC_VPBROADCASTW (0x79 | P_EXT38 | P_DATA16) +#define OPC_VPBROADCASTD (0x58 | P_EXT38 | P_DATA16) +#define OPC_VPBROADCASTQ (0x59 | P_EXT38 | P_DATA16) +#define OPC_VZEROUPPER (0x77 | P_EXT) #define OPC_XCHG_ax_r32 (0x90) #define OPC_GRP3_Ev (0xf7) @@ -479,11 +559,20 @@ static void tcg_out_modrm(TCGContext *s, int opc, int r, int rm) tcg_out8(s, 0xc0 | (LOWREGMASK(r) << 3) | LOWREGMASK(rm)); } -static void tcg_out_vex_modrm(TCGContext *s, int opc, int r, int v, int rm) +static void tcg_out_vex_opc(TCGContext *s, int opc, int r, int v, + int rm, int index) { int tmp; - if ((opc & (P_REXW | P_EXT | P_EXT38)) || (rm & 8)) { + /* Use the two byte form if possible, which cannot encode + VEX.W, VEX.B, VEX.X, or an m-mmmm field other than P_EXT. */ + if ((opc & (P_EXT | P_EXT38 | P_REXW)) == P_EXT + && ((rm | index) & 8) == 0) { + /* Two byte VEX prefix. */ + tcg_out8(s, 0xc5); + + tmp = (r & 8 ? 0 : 0x80); /* VEX.R */ + } else { /* Three byte VEX prefix. */ tcg_out8(s, 0xc4); @@ -493,20 +582,17 @@ static void tcg_out_vex_modrm(TCGContext *s, int opc, int r, int v, int rm) } else if (opc & P_EXT) { tmp = 1; } else { - tcg_abort(); + g_assert_not_reached(); } - tmp |= 0x40; /* VEX.X */ - tmp |= (r & 8 ? 0 : 0x80); /* VEX.R */ - tmp |= (rm & 8 ? 0 : 0x20); /* VEX.B */ + tmp |= (r & 8 ? 0 : 0x80); /* VEX.R */ + tmp |= (index & 8 ? 0 : 0x40); /* VEX.X */ + tmp |= (rm & 8 ? 0 : 0x20); /* VEX.B */ tcg_out8(s, tmp); - tmp = (opc & P_REXW ? 0x80 : 0); /* VEX.W */ - } else { - /* Two byte VEX prefix. */ - tcg_out8(s, 0xc5); - - tmp = (r & 8 ? 0 : 0x80); /* VEX.R */ + tmp = (opc & P_REXW ? 0x80 : 0); /* VEX.W */ } + + tmp |= (opc & P_VEXL ? 0x04 : 0); /* VEX.L */ /* VEX.pp */ if (opc & P_DATA16) { tmp |= 1; /* 0x66 */ @@ -518,6 +604,11 @@ static void tcg_out_vex_modrm(TCGContext *s, int opc, int r, int v, int rm) tmp |= (~v & 15) << 3; /* VEX.vvvv */ tcg_out8(s, tmp); tcg_out8(s, opc); +} + +static void tcg_out_vex_modrm(TCGContext *s, int opc, int r, int v, int rm) +{ + tcg_out_vex_opc(s, opc, r, v, rm, 0); tcg_out8(s, 0xc0 | (LOWREGMASK(r) << 3) | LOWREGMASK(rm)); } @@ -526,8 +617,8 @@ static void tcg_out_vex_modrm(TCGContext *s, int opc, int r, int v, int rm) mode for absolute addresses, ~RM is the size of the immediate operand that will follow the instruction. */ -static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, - int index, int shift, intptr_t offset) +static void tcg_out_sib_offset(TCGContext *s, int r, int rm, int index, + int shift, intptr_t offset) { int mod, len; @@ -538,7 +629,6 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, intptr_t pc = (intptr_t)s->code_ptr + 5 + ~rm; intptr_t disp = offset - pc; if (disp == (int32_t)disp) { - tcg_out_opc(s, opc, r, 0, 0); tcg_out8(s, (LOWREGMASK(r) << 3) | 5); tcg_out32(s, disp); return; @@ -548,7 +638,6 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, use of the MODRM+SIB encoding and is therefore larger than rip-relative addressing. */ if (offset == (int32_t)offset) { - tcg_out_opc(s, opc, r, 0, 0); tcg_out8(s, (LOWREGMASK(r) << 3) | 4); tcg_out8(s, (4 << 3) | 5); tcg_out32(s, offset); @@ -556,10 +645,9 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, } /* ??? The memory isn't directly addressable. */ - tcg_abort(); + g_assert_not_reached(); } else { /* Absolute address. */ - tcg_out_opc(s, opc, r, 0, 0); tcg_out8(s, (r << 3) | 5); tcg_out32(s, offset); return; @@ -582,7 +670,6 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, that would be used for %esp is the escape to the two byte form. */ if (index < 0 && LOWREGMASK(rm) != TCG_REG_ESP) { /* Single byte MODRM format. */ - tcg_out_opc(s, opc, r, rm, 0); tcg_out8(s, mod | (LOWREGMASK(r) << 3) | LOWREGMASK(rm)); } else { /* Two byte MODRM+SIB format. */ @@ -596,7 +683,6 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, tcg_debug_assert(index != TCG_REG_ESP); } - tcg_out_opc(s, opc, r, rm, index); tcg_out8(s, mod | (LOWREGMASK(r) << 3) | 4); tcg_out8(s, (shift << 6) | (LOWREGMASK(index) << 3) | LOWREGMASK(rm)); } @@ -608,6 +694,21 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, } } +static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, + int index, int shift, intptr_t offset) +{ + tcg_out_opc(s, opc, r, rm < 0 ? 0 : rm, index < 0 ? 0 : index); + tcg_out_sib_offset(s, r, rm, index, shift, offset); +} + +static void tcg_out_vex_modrm_sib_offset(TCGContext *s, int opc, int r, int v, + int rm, int index, int shift, + intptr_t offset) +{ + tcg_out_vex_opc(s, opc, r, v, rm < 0 ? 0 : rm, index < 0 ? 0 : index); + tcg_out_sib_offset(s, r, rm, index, shift, offset); +} + /* A simplification of the above with no index or shift. */ static inline void tcg_out_modrm_offset(TCGContext *s, int opc, int r, int rm, intptr_t offset) @@ -615,6 +716,30 @@ static inline void tcg_out_modrm_offset(TCGContext *s, int opc, int r, tcg_out_modrm_sib_offset(s, opc, r, rm, -1, 0, offset); } +static inline void tcg_out_vex_modrm_offset(TCGContext *s, int opc, int r, + int v, int rm, intptr_t offset) +{ + tcg_out_vex_modrm_sib_offset(s, opc, r, v, rm, -1, 0, offset); +} + +/* Output an opcode with an expected reference to the constant pool. */ +static inline void tcg_out_modrm_pool(TCGContext *s, int opc, int r) +{ + tcg_out_opc(s, opc, r, 0, 0); + /* Absolute for 32-bit, pc-relative for 64-bit. */ + tcg_out8(s, LOWREGMASK(r) << 3 | 5); + tcg_out32(s, 0); +} + +/* Output an opcode with an expected reference to the constant pool. */ +static inline void tcg_out_vex_modrm_pool(TCGContext *s, int opc, int r) +{ + tcg_out_vex_opc(s, opc, r, 0, 0, 0); + /* Absolute for 32-bit, pc-relative for 64-bit. */ + tcg_out8(s, LOWREGMASK(r) << 3 | 5); + tcg_out32(s, 0); +} + /* Generate dest op= src. Uses the same ARITH_* codes as tgen_arithi. */ static inline void tgen_arithr(TCGContext *s, int subop, int dest, int src) { @@ -625,12 +750,114 @@ static inline void tgen_arithr(TCGContext *s, int subop, int dest, int src) tcg_out_modrm(s, OPC_ARITH_GvEv + (subop << 3) + ext, dest, src); } -static inline void tcg_out_mov(TCGContext *s, TCGType type, - TCGReg ret, TCGReg arg) +static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg) +{ + int rexw = 0; + + if (arg == ret) { + return; + } + switch (type) { + case TCG_TYPE_I64: + rexw = P_REXW; + /* fallthru */ + case TCG_TYPE_I32: + if (ret < 16) { + if (arg < 16) { + tcg_out_modrm(s, OPC_MOVL_GvEv + rexw, ret, arg); + } else { + tcg_out_vex_modrm(s, OPC_MOVD_EyVy + rexw, ret, 0, arg); + } + } else { + if (arg < 16) { + tcg_out_vex_modrm(s, OPC_MOVD_VyEy + rexw, ret, 0, arg); + } else { + tcg_out_vex_modrm(s, OPC_MOVQ_VqWq, ret, 0, arg); + } + } + break; + + case TCG_TYPE_V64: + tcg_debug_assert(ret >= 16 && arg >= 16); + tcg_out_vex_modrm(s, OPC_MOVQ_VqWq, ret, 0, arg); + break; + case TCG_TYPE_V128: + tcg_debug_assert(ret >= 16 && arg >= 16); + tcg_out_vex_modrm(s, OPC_MOVDQA_VxWx, ret, 0, arg); + break; + case TCG_TYPE_V256: + tcg_debug_assert(ret >= 16 && arg >= 16); + tcg_out_vex_modrm(s, OPC_MOVDQA_VxWx | P_VEXL, ret, 0, arg); + break; + + default: + g_assert_not_reached(); + } +} + +static void tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece, + TCGReg r, TCGReg a) { - if (arg != ret) { - int opc = OPC_MOVL_GvEv + (type == TCG_TYPE_I64 ? P_REXW : 0); - tcg_out_modrm(s, opc, ret, arg); + if (have_avx2) { + static const int dup_insn[4] = { + OPC_VPBROADCASTB, OPC_VPBROADCASTW, + OPC_VPBROADCASTD, OPC_VPBROADCASTQ, + }; + int vex_l = (type == TCG_TYPE_V256 ? P_VEXL : 0); + tcg_out_vex_modrm(s, dup_insn[vece] + vex_l, r, 0, a); + } else { + switch (vece) { + case MO_8: + /* ??? With zero in a register, use PSHUFB. */ + tcg_out_vex_modrm(s, OPC_PUNPCKLBW, r, 0, a); + a = r; + /* FALLTHRU */ + case MO_16: + tcg_out_vex_modrm(s, OPC_PUNPCKLWD, r, 0, a); + a = r; + /* FALLTHRU */ + case MO_32: + tcg_out_vex_modrm(s, OPC_PSHUFD, r, 0, a); + /* imm8 operand: all output lanes selected from input lane 0. */ + tcg_out8(s, 0); + break; + case MO_64: + tcg_out_vex_modrm(s, OPC_PUNPCKLQDQ, r, 0, a); + break; + default: + g_assert_not_reached(); + } + } +} + +static void tcg_out_dupi_vec(TCGContext *s, TCGType type, + TCGReg ret, tcg_target_long arg) +{ + int vex_l = (type == TCG_TYPE_V256 ? P_VEXL : 0); + + if (arg == 0) { + tcg_out_vex_modrm(s, OPC_PXOR, ret, ret, ret); + return; + } + if (arg == -1) { + tcg_out_vex_modrm(s, OPC_PCMPEQB + vex_l, ret, ret, ret); + return; + } + + if (TCG_TARGET_REG_BITS == 64) { + if (have_avx2) { + tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTQ + vex_l, ret); + } else { + tcg_out_vex_modrm_pool(s, OPC_MOVDDUP, ret); + } + new_pool_label(s, arg, R_386_PC32, s->code_ptr - 4, -4); + } else if (have_avx2) { + tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTD + vex_l, ret); + new_pool_label(s, arg, R_386_32, s->code_ptr - 4, 0); + } else { + tcg_out_vex_modrm_pool(s, OPC_MOVD_VyEy, ret); + new_pool_label(s, arg, R_386_32, s->code_ptr - 4, 0); + tcg_out_dup_vec(s, type, MO_32, ret, ret); } } @@ -639,6 +866,25 @@ static void tcg_out_movi(TCGContext *s, TCGType type, { tcg_target_long diff; + switch (type) { + case TCG_TYPE_I32: +#if TCG_TARGET_REG_BITS == 64 + case TCG_TYPE_I64: +#endif + if (ret < 16) { + break; + } + /* fallthru */ + case TCG_TYPE_V64: + case TCG_TYPE_V128: + case TCG_TYPE_V256: + tcg_debug_assert(ret >= 16); + tcg_out_dupi_vec(s, type, ret, arg); + return; + default: + g_assert_not_reached(); + } + if (arg == 0) { tgen_arithr(s, ARITH_XOR, ret, ret); return; @@ -667,6 +913,59 @@ static void tcg_out_movi(TCGContext *s, TCGType type, tcg_out64(s, arg); } +static void tcg_out_movi_vec(TCGContext *s, TCGType type, + TCGReg ret, const TCGArg *a) +{ + int n = (64 / TCG_TARGET_REG_BITS) << (type - TCG_TYPE_V64); + int opc, ofs, rel; + + tcg_debug_assert(ret >= 16); + tcg_debug_assert(type >= TCG_TYPE_V64); + + /* We assume that INDEX_op_dupi could not be used and therefore + we must use a constant pool entry. */ + + switch (type) { + case TCG_TYPE_V64: + opc = OPC_MOVQ_VqWq; + break; + case TCG_TYPE_V128: + opc = OPC_MOVDQU_VxWx; + break; + case TCG_TYPE_V256: + opc = OPC_MOVDQU_VxWx | P_VEXL; + break; + default: + g_assert_not_reached(); + } + tcg_out_vex_modrm_pool(s, opc, ret); + + if (TCG_TARGET_REG_BITS == 64) { + rel = R_386_PC32, ofs = -4; + } else { + rel = R_386_32, ofs = 0; + } + switch (n) { + case 1: + new_pool_label(s, a[0], rel, s->code_ptr - 4, rel); + break; + case 2: + new_pool_l2(s, rel, s->code_ptr - 4, ofs, a[0], a[1]); + break; + case 4: + new_pool_l4(s, rel, s->code_ptr - 4, ofs, a[0], a[1], a[2], a[3]); + break; +#if TCG_TARGET_REG_BITS == 32 + case 8: + new_pool_l8(s, rel, s->code_ptr - 4, ofs, + a[0], a[1], a[2], a[3], a[4], a[5], a[6], a[7]); + break; +#endif + default: + g_assert_not_reached(); + } +} + static inline void tcg_out_pushi(TCGContext *s, tcg_target_long val) { if (val == (int8_t)val) { @@ -702,18 +1001,74 @@ static inline void tcg_out_pop(TCGContext *s, int reg) tcg_out_opc(s, OPC_POP_r32 + LOWREGMASK(reg), 0, reg, 0); } -static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, - TCGReg arg1, intptr_t arg2) +static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, + TCGReg arg1, intptr_t arg2) { - int opc = OPC_MOVL_GvEv + (type == TCG_TYPE_I64 ? P_REXW : 0); - tcg_out_modrm_offset(s, opc, ret, arg1, arg2); + switch (type) { + case TCG_TYPE_I32: + if (ret < 16) { + tcg_out_modrm_offset(s, OPC_MOVL_GvEv, ret, arg1, arg2); + } else { + tcg_out_vex_modrm_offset(s, OPC_MOVD_VyEy, ret, 0, arg1, arg2); + } + break; + case TCG_TYPE_I64: + if (ret < 16) { + tcg_out_modrm_offset(s, OPC_MOVL_GvEv | P_REXW, ret, arg1, arg2); + break; + } + /* FALLTHRU */ + case TCG_TYPE_V64: + tcg_debug_assert(ret >= 16); + tcg_out_vex_modrm_offset(s, OPC_MOVQ_VqWq, ret, 0, arg1, arg2); + break; + case TCG_TYPE_V128: + tcg_debug_assert(ret >= 16); + tcg_out_vex_modrm_offset(s, OPC_MOVDQU_VxWx, ret, 0, arg1, arg2); + break; + case TCG_TYPE_V256: + tcg_debug_assert(ret >= 16); + tcg_out_vex_modrm_offset(s, OPC_MOVDQU_VxWx | P_VEXL, + ret, 0, arg1, arg2); + break; + default: + g_assert_not_reached(); + } } -static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, - TCGReg arg1, intptr_t arg2) +static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, + TCGReg arg1, intptr_t arg2) { - int opc = OPC_MOVL_EvGv + (type == TCG_TYPE_I64 ? P_REXW : 0); - tcg_out_modrm_offset(s, opc, arg, arg1, arg2); + switch (type) { + case TCG_TYPE_I32: + if (arg < 16) { + tcg_out_modrm_offset(s, OPC_MOVL_EvGv, arg, arg1, arg2); + } else { + tcg_out_vex_modrm_offset(s, OPC_MOVD_EyVy, arg, 0, arg1, arg2); + } + break; + case TCG_TYPE_I64: + if (arg < 16) { + tcg_out_modrm_offset(s, OPC_MOVL_EvGv | P_REXW, arg, arg1, arg2); + break; + } + /* FALLTHRU */ + case TCG_TYPE_V64: + tcg_debug_assert(arg >= 16); + tcg_out_vex_modrm_offset(s, OPC_MOVQ_WqVq, arg, 0, arg1, arg2); + break; + case TCG_TYPE_V128: + tcg_debug_assert(arg >= 16); + tcg_out_vex_modrm_offset(s, OPC_MOVDQU_WxVx, arg, 0, arg1, arg2); + break; + case TCG_TYPE_V256: + tcg_debug_assert(arg >= 16); + tcg_out_vex_modrm_offset(s, OPC_MOVDQU_WxVx | P_VEXL, + arg, 0, arg1, arg2); + break; + default: + g_assert_not_reached(); + } } static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val, @@ -725,6 +1080,8 @@ static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val, return false; } rexw = P_REXW; + } else if (type != TCG_TYPE_I32) { + return false; } tcg_out_modrm_offset(s, OPC_MOVL_EvIz | rexw, 0, base, ofs); tcg_out32(s, val); @@ -2259,8 +2616,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, break; case INDEX_op_mov_i32: /* Always emitted via tcg_out_mov. */ case INDEX_op_mov_i64: + case INDEX_op_mov_vec: case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi. */ case INDEX_op_movi_i64: + case INDEX_op_dupi_vec: case INDEX_op_call: /* Always emitted via tcg_out_call. */ default: tcg_abort(); @@ -2269,6 +2628,73 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, #undef OP_32_64 } +static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, + unsigned vecl, unsigned vece, + const TCGArg *args, const int *const_args) +{ + static int const add_insn[4] = { + OPC_PADDB, OPC_PADDW, OPC_PADDD, OPC_PADDQ + }; + static int const sub_insn[4] = { + OPC_PSUBB, OPC_PSUBW, OPC_PSUBD, OPC_PSUBQ + }; + + TCGType type = vecl + TCG_TYPE_V64; + int insn; + TCGArg a0, a1, a2; + + a0 = args[0]; + a1 = args[1]; + a2 = args[2]; + + switch (opc) { + case INDEX_op_add_vec: + insn = add_insn[vece]; + goto gen_simd; + case INDEX_op_sub_vec: + insn = sub_insn[vece]; + goto gen_simd; + case INDEX_op_and_vec: + insn = OPC_PAND; + goto gen_simd; + case INDEX_op_or_vec: + insn = OPC_POR; + goto gen_simd; + case INDEX_op_xor_vec: + insn = OPC_PXOR; + gen_simd: + if (type == TCG_TYPE_V256) { + insn |= P_VEXL; + } + tcg_out_vex_modrm(s, insn, a0, a1, a2); + break; + + case INDEX_op_andc_vec: + insn = OPC_PANDN; + if (type == TCG_TYPE_V256) { + insn |= P_VEXL; + } + tcg_out_vex_modrm(s, insn, a0, a2, a1); + break; + + case INDEX_op_ld_vec: + tcg_out_ld(s, type, a0, a1, a2); + break; + case INDEX_op_st_vec: + tcg_out_st(s, type, a0, a1, a2); + break; + case INDEX_op_movi_vec: + tcg_out_movi_vec(s, type, a0, args + 1); + break; + case INDEX_op_dup_vec: + tcg_out_dup_vec(s, type, vece, a0, a1); + break; + + default: + g_assert_not_reached(); + } +} + static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) { static const TCGTargetOpDef r = { .args_ct_str = { "r" } }; @@ -2292,6 +2718,9 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) = { .args_ct_str = { "r", "r", "L", "L" } }; static const TCGTargetOpDef L_L_L_L = { .args_ct_str = { "L", "L", "L", "L" } }; + static const TCGTargetOpDef x_x = { .args_ct_str = { "x", "x" } }; + static const TCGTargetOpDef x_x_x = { .args_ct_str = { "x", "x", "x" } }; + static const TCGTargetOpDef x_r = { .args_ct_str = { "x", "r" } }; switch (op) { case INDEX_op_goto_ptr: @@ -2493,6 +2922,20 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) return &s2; } + case INDEX_op_ld_vec: + case INDEX_op_st_vec: + return &x_r; + + case INDEX_op_add_vec: + case INDEX_op_sub_vec: + case INDEX_op_and_vec: + case INDEX_op_or_vec: + case INDEX_op_xor_vec: + case INDEX_op_andc_vec: + return &x_x_x; + case INDEX_op_dup_vec: + return &x_x; + default: break; } @@ -2577,6 +3020,9 @@ static void tcg_target_qemu_prologue(TCGContext *s) tcg_out_addi(s, TCG_REG_CALL_STACK, stack_addend); + if (have_avx2) { + tcg_out_vex_opc(s, OPC_VZEROUPPER, 0, 0, 0, 0); + } for (i = ARRAY_SIZE(tcg_target_callee_save_regs) - 1; i >= 0; i--) { tcg_out_pop(s, tcg_target_callee_save_regs[i]); } @@ -2598,9 +3044,16 @@ static void tcg_out_nop_fill(tcg_insn_unit *p, int count) static void tcg_target_init(TCGContext *s) { #ifdef CONFIG_CPUID_H - unsigned a, b, c, d; + unsigned a, b, c, d, b7 = 0; int max = __get_cpuid_max(0, 0); + if (max >= 7) { + /* BMI1 is available on AMD Piledriver and Intel Haswell CPUs. */ + __cpuid_count(7, 0, a, b7, c, d); + have_bmi1 = (b7 & bit_BMI) != 0; + have_bmi2 = (b7 & bit_BMI2) != 0; + } + if (max >= 1) { __cpuid(1, a, b, c, d); #ifndef have_cmov @@ -2609,17 +3062,22 @@ static void tcg_target_init(TCGContext *s) available, we'll use a small forward branch. */ have_cmov = (d & bit_CMOV) != 0; #endif + /* MOVBE is only available on Intel Atom and Haswell CPUs, so we need to probe for it. */ have_movbe = (c & bit_MOVBE) != 0; have_popcnt = (c & bit_POPCNT) != 0; - } - if (max >= 7) { - /* BMI1 is available on AMD Piledriver and Intel Haswell CPUs. */ - __cpuid_count(7, 0, a, b, c, d); - have_bmi1 = (b & bit_BMI) != 0; - have_bmi2 = (b & bit_BMI2) != 0; + /* There are a number of things we must check before we can be + sure of not hitting invalid opcode. */ + if (c & bit_OSXSAVE) { + unsigned xcrl, xcrh; + asm ("xgetbv" : "=a" (xcrl), "=d" (xcrh) : "c" (0)); + if ((xcrl & 6) == 6) { + have_avx1 = (c & bit_AVX) != 0; + have_avx2 = (b7 & bit_AVX2) != 0; + } + } } max = __get_cpuid_max(0x8000000, 0); @@ -2630,11 +3088,16 @@ static void tcg_target_init(TCGContext *s) } #endif /* CONFIG_CPUID_H */ + tcg_target_available_regs[TCG_TYPE_I32] = ALL_GENERAL_REGS; if (TCG_TARGET_REG_BITS == 64) { - tcg_target_available_regs[TCG_TYPE_I32] = 0xffff; - tcg_target_available_regs[TCG_TYPE_I64] = 0xffff; - } else { - tcg_target_available_regs[TCG_TYPE_I32] = 0xff; + tcg_target_available_regs[TCG_TYPE_I64] = ALL_GENERAL_REGS; + } + if (have_avx1) { + tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS; + tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS; + } + if (have_avx2) { + tcg_target_available_regs[TCG_TYPE_V256] = ALL_VECTOR_REGS; } tcg_target_call_clobber_regs = 0; From patchwork Mon Dec 18 17:17:41 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122245 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3103321qgn; Mon, 18 Dec 2017 09:22:13 -0800 (PST) X-Google-Smtp-Source: ACJfBosckWap6w0MsyRhPPJUFRy3IdOmb45vBbHywQkVBJSKcivRKBnS5Iv1eD3lerfFV8fLrtxd X-Received: by 10.129.169.194 with SMTP id g185mr391453ywh.420.1513617733913; Mon, 18 Dec 2017 09:22:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513617733; cv=none; d=google.com; s=arc-20160816; b=Kw/Uq6MgkOyMYgEl+G8GOcx8X9UGhoNtwFTpxsLUtivmtQRNyzTcQ4wMDcnrcXtwUL PEq5grhqkwe98R21h0I/gr1OSuoIvj9FmIF3ec1VhwXbJK4rTOgjouNqiKug/ow5yvCY 9TM06GapIwOoMoUkHGR8bCKVB7UvB6Q4joqHBcVMFPPpCo4iBvacEcPwdRQcWR7E4sLt DT+nbfTuAWCXDr4U8qkcpLsaYh1SH9BPse58UwechXtMRJIf5ONtglY08j08pRRGZ6pt TjDD/Mn0/vs6wC48wHCBO/uLLi0Is9LS+AzOoS/EnbXtwcGJy/byR4f9wVT8xejrzy6r 5ojw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=/GgjKGXHrQaYbQicWKE6S3D9Cp7sWgPEK7F5sH0PF2I=; b=NqSVaYmDEGbNK2TXpe0sYWOZfgwTWqBDu1A1CcF4RDhlYrL6w52Y6JYvU312Jl5BOk /TSPoAlyFB8hGtQhyVLMG231vZUO1JEeKTqrF9v1HVHXMvcZ9UohnZOKPtzK8MK5RDLC UVehJqE+en/Eyf9+hbvMbXvlFxrIAplzYZgJEsMfKORHO441QMmk8tbR8MZc0gkzqwGe oLhR3NBj7jH9hcA+ZOar1QruxiKWR6XDdHWdWW32lEULeiVGEyjg2eOUtSXkAfPONGeE tugfzgYTMpmCObKbYPakwtes/lltRLK52KkEWBcRsAw4XdcQy6CxNe/Hl26DWSBU4sYS LHiQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=F9ah4O3A; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id t84si2527310ywf.135.2017.12.18.09.22.13 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:22:13 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=F9ah4O3A; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58455 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz7F-0003Iq-C8 for patch@linaro.org; Mon, 18 Dec 2017 12:22:13 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37034) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3V-0000OC-U3 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:26 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3Q-00010q-Es for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:21 -0500 Received: from mail-pl0-x244.google.com ([2607:f8b0:400e:c01::244]:44969) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3Q-0000zg-6v for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:16 -0500 Received: by mail-pl0-x244.google.com with SMTP id n13so5193364plp.11 for ; Mon, 18 Dec 2017 09:18:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=/GgjKGXHrQaYbQicWKE6S3D9Cp7sWgPEK7F5sH0PF2I=; b=F9ah4O3Ao1lelNyK93Wqnilt4YwczA3AoRXE0J9fhFspvfFKaRe885lT7/jcCd5P/i +jKykItG7vi6XKO5fLsHUI9Mx6OP8BcAezSubIfDGbraZAOFq3nwAKTaNa9MNnU2Uqj/ b6HWJsHa+Lm0m9uKrH73qcBPGAfQWNaSx726s= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=/GgjKGXHrQaYbQicWKE6S3D9Cp7sWgPEK7F5sH0PF2I=; b=tawFjgkmiJL+ARAPQNAIWovmTT1Qlvw+89OJySATXTfZRho9PFPG0GAY7RoJnfbBEa rf1/afFjPtSIUVauf8oJyHKLXY4q8WC8Z2Ehzy5rr6NECW+Ly6tsUW864ePY2oy3ARHE 7cIXll+3n5q3NyOXdImydgKROuGHdy4gXTBWwKErKA/Rvx3t7HG8OPOnOMtNvQwpBkmS SoXp8ntQ/QDQ4WjjBMkX13yJXjYq7eJY9Fjdq3LBmTfI7o5PpPrQvVGUR9+0WmA/A7mW G5DuZYoG/lIvDK84butzLODDcD0rUCDM0wMZLOe2uvqcUqy7HVireOrHqaxiZznYrGzN JZyA== X-Gm-Message-State: AKGB3mIjQ1gW6REAdD6llu1kC5SXGWHWRNP4PT08WJTu+J/G2QwVlEt7 udQUB1EZSsbZrxhAqqCLDy6ZLxqodzk= X-Received: by 10.84.252.148 with SMTP id y20mr384437pll.245.1513617494881; Mon, 18 Dec 2017 09:18:14 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.13 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:13 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:41 -0800 Message-Id: <20171218171758.16964-10-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::244 Subject: [Qemu-devel] [PATCH v7 09/26] tcg: Add tcg_expand_vec_op and tcg-target.opc.h X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" These will be useful in the next few patches adding shifts, permutes, and multiplication. Signed-off-by: Richard Henderson --- tcg/i386/tcg-target.opc.h | 3 +++ tcg/tcg-opc.h | 6 ++++++ tcg/tcg.h | 15 +++++++++++++++ tcg/i386/tcg-target.inc.c | 21 +++++++++++++++++++++ tcg/tcg.c | 13 ++++++++++--- 5 files changed, 55 insertions(+), 3 deletions(-) create mode 100644 tcg/i386/tcg-target.opc.h -- 2.14.3 diff --git a/tcg/i386/tcg-target.opc.h b/tcg/i386/tcg-target.opc.h new file mode 100644 index 0000000000..4816a6c3d4 --- /dev/null +++ b/tcg/i386/tcg-target.opc.h @@ -0,0 +1,3 @@ +/* Target-specific opcodes for host vector expansion. These will be + emitted by tcg_expand_vec_op. For those familiar with GCC internals, + consider these to be UNSPEC with names. */ diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index 4e62eda14b..b4e16cfbc3 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -229,6 +229,12 @@ DEF(andc_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_andc_vec)) DEF(orc_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_orc_vec)) DEF(not_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_not_vec)) +DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT) + +#if TCG_TARGET_MAYBE_vec +#include "tcg-target.opc.h" +#endif + #undef TLADDR_ARGS #undef DATA64_ARGS #undef IMPL diff --git a/tcg/tcg.h b/tcg/tcg.h index dce483b0ee..0e9d79bdd4 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -1207,6 +1207,21 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr); void tcg_register_jit(void *buf, size_t buf_size); +#if TCG_TARGET_MAYBE_vec +/* Return zero if the tuple (opc, type, vece) is unsupportable; + return > 0 if it is directly supportable; + return < 0 if we must call tcg_expand_vec_op. */ +int tcg_can_emit_vec_op(TCGOpcode, TCGType, unsigned); +#else +static inline int tcg_can_emit_vec_op(TCGOpcode o, TCGType t, unsigned ve) +{ + return 0; +} +#endif + +/* Expand the tuple (opc, type, vece) on the given arguments. */ +void tcg_expand_vec_op(TCGOpcode, TCGType, unsigned, TCGArg, ...); + /* * Memory helpers that will be used by TCG generated code. */ diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c index e9a4d92598..062cf16607 100644 --- a/tcg/i386/tcg-target.inc.c +++ b/tcg/i386/tcg-target.inc.c @@ -2942,6 +2942,27 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) return NULL; } +int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) +{ + switch (opc) { + case INDEX_op_add_vec: + case INDEX_op_sub_vec: + case INDEX_op_and_vec: + case INDEX_op_or_vec: + case INDEX_op_xor_vec: + case INDEX_op_andc_vec: + return true; + + default: + return false; + } +} + +void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece, + TCGArg a0, ...) +{ +} + static const int tcg_target_callee_save_regs[] = { #if TCG_TARGET_REG_BITS == 64 TCG_REG_RBP, diff --git a/tcg/tcg.c b/tcg/tcg.c index 16b8faf66f..b4f8938fb0 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1404,10 +1404,10 @@ bool tcg_op_supported(TCGOpcode op) case INDEX_op_orc_vec: return have_vec && TCG_TARGET_HAS_orc_vec; - case NB_OPS: - break; + default: + tcg_debug_assert(op > INDEX_op_last_generic && op < NB_OPS); + return true; } - g_assert_not_reached(); } /* Note: we convert the 64 bit args to 32 bit and do some alignment @@ -3737,3 +3737,10 @@ void tcg_register_jit(void *buf, size_t buf_size) { } #endif /* ELF_HOST_MACHINE */ + +#if !TCG_TARGET_MAYBE_vec +void tcg_expand_vec_op(TCGOpcode o, TCGType t, unsigned e, TCGArg a0, ...) +{ + g_assert_not_reached(); +} +#endif From patchwork Mon Dec 18 17:17:42 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122257 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3113119qgn; Mon, 18 Dec 2017 09:31:52 -0800 (PST) X-Google-Smtp-Source: ACJfBos9pShdVtxkHFvlzo04dkrXhFstZSu4iP/tmaUST+f6/zpc0Ztz4WlFwI3vh3Vl4MhzYClQ X-Received: by 10.129.229.68 with SMTP id c4mr370243ywm.503.1513618311982; Mon, 18 Dec 2017 09:31:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513618311; cv=none; d=google.com; s=arc-20160816; b=ypEMbBUcRCoOy5fPNFIm7FzTmDvaPZo9Et8mTLBPkF1kudtI7/hxbeahziyqR3afWF sK1R1EQ9AhUZFGJw6tkvzDYfFw8D1ic9N5bQw+DWe153p6nvcMNps3v/niWuxvmdGYIb 80et2sI7c18gQH4LyNOXljnEh/rWewSojYHU96Po7/SB8tObOLESQOfiimWPfNF/Fd4I 6iDv2ghzmUZDWxFGDXqd29vmKDT0bYILa1HvPRUAoW5Q0H8qqaiBFodDoCQHfU9ePhnJ 3QxBiG6+PwJYLCAmnhAUL6Lh6YabPyJFzImKzPWvN7ITa38/hsbGOspGln2vs69tEViT HJyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=vYdpLfmhC9S2eebmbcKP8GI9s6Lsp0zgfH/q7sOlirg=; b=riFFUY2ZVoPJaxoFwMSuiR/SYR+y1Wj0bLO1DzBbuKstIlQSiyjPBnxNXh57d67SoT USdugq+ga0dczj122QpenJwp940gDDIAuHMc8GnDlguZWGZYXQTd99PajgNN8tCECicp G+BmceIwvfVq6VEwHNZNiLnqq1spJwu6LwiC6M95wpOjyaQDOmOIraFseqvgYArfRN9y +8Twb6ZckncgPEHWrX4u6tViRlgsTqIDkOv/X3BL/HJixOTFI5SptOvSwORQXp/JLrmU gpw+myNNjY63hJfJH7jbXO08K9cLgjXEDlbzpAOHwNtVelQjTBg//Z2xxFkXZMLmXoOO wX5g== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=I36gB3GK; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id 143si1402942ywx.210.2017.12.18.09.31.51 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:31:51 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=I36gB3GK; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58615 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzGZ-0003Ki-DI for patch@linaro.org; Mon, 18 Dec 2017 12:31:51 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37218) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3b-0000Ux-H8 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:35 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3S-000137-4t for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:27 -0500 Received: from mail-pl0-x242.google.com ([2607:f8b0:400e:c01::242]:44968) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3R-00011z-Q9 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:18 -0500 Received: by mail-pl0-x242.google.com with SMTP id n13so5193407plp.11 for ; Mon, 18 Dec 2017 09:18:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=vYdpLfmhC9S2eebmbcKP8GI9s6Lsp0zgfH/q7sOlirg=; b=I36gB3GK3HI6367bhvtFiohbFI0A/3/H81EAX47pZI9lApyls4+ZR7JbmLwqfRl9hE oFicx12E1A3BLgW9trwtfgtZnOvwe/02xT01lsnz6WVkEH6NX4isA2WKfH0S1QTJI5nY 1tyIIAJP9uWvdELy2UQWXpFaR9VT5zaDmSNug= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=vYdpLfmhC9S2eebmbcKP8GI9s6Lsp0zgfH/q7sOlirg=; b=YE2Q3jOXy8F/kNeV16KTNwnwOW8Iw3mJXoXdjLXfZalDkKSZB/aAOmVKTSg+qqJ5eh LameGTMcC1kXatmIgyK8F+YPAN6bcJc41DY8y2G9YnGWpXqvFZwrcYM0nRRoew4qWS2S vM1+AjSgrtmaIT6ySXC9Ay5T+cOpgLoxu9FcgjxO8w4FGgsvPcTqBvtICfQHCi+tS0G8 SJSHcwYhbhp6e2KPsmA1NMpz3ceyL5L+0czzaGcF0vXsShpl6Qm/nRLzxVB7lVvpsMb3 qeWIAvtgdG9zfmXwZwvYlXLWJ9lLguXfI22EONLa4R82I6OvOg+s2dgfbHpJH53NESZc 14UQ== X-Gm-Message-State: AKGB3mI7uLxrL0fqSJYpAFHedI2YrbUd/MmdB+d9OlC6t76nj8pOjPp9 vJVO0p5DXHkRPq2ewfdzQSeswL2qybM= X-Received: by 10.84.224.5 with SMTP id r5mr378511plj.341.1513617496245; Mon, 18 Dec 2017 09:18:16 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.14 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:15 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:42 -0800 Message-Id: <20171218171758.16964-11-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::242 Subject: [Qemu-devel] [PATCH v7 10/26] tcg: Add generic vector ops for interleave X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Includes zip, unzip, and transform. Signed-off-by: Richard Henderson --- accel/tcg/tcg-runtime.h | 15 ++ tcg/i386/tcg-target.h | 3 + tcg/tcg-op-gvec.h | 17 +++ tcg/tcg-op.h | 6 + tcg/tcg-opc.h | 7 + tcg/tcg.h | 3 + accel/tcg/tcg-runtime-gvec.c | 78 ++++++++++ tcg/tcg-op-gvec.c | 337 ++++++++++++++++++++++++++++++++++++++++++- tcg/tcg-op-vec.c | 55 +++++++ tcg/tcg.c | 9 ++ tcg/README | 40 +++++ 11 files changed, 562 insertions(+), 8 deletions(-) -- 2.14.3 diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h index 76ee41ce58..c6de749134 100644 --- a/accel/tcg/tcg-runtime.h +++ b/accel/tcg/tcg-runtime.h @@ -163,3 +163,18 @@ DEF_HELPER_FLAGS_4(gvec_or, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_xor, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_andc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_orc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_zip8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_zip16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_zip32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_zip64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_uzp8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_uzp16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_uzp32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_uzp64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_trn8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_trn16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_trn32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_trn64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h index f9d3fc4a93..ff0ad7dcdb 100644 --- a/tcg/i386/tcg-target.h +++ b/tcg/i386/tcg-target.h @@ -177,6 +177,9 @@ extern bool have_avx2; #define TCG_TARGET_HAS_orc_vec 0 #define TCG_TARGET_HAS_not_vec 0 #define TCG_TARGET_HAS_neg_vec 0 +#define TCG_TARGET_HAS_zip_vec 0 +#define TCG_TARGET_HAS_uzp_vec 0 +#define TCG_TARGET_HAS_trn_vec 0 #define TCG_TARGET_deposit_i32_valid(ofs, len) \ (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \ diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h index 95739946ff..64270a3c74 100644 --- a/tcg/tcg-op-gvec.h +++ b/tcg/tcg-op-gvec.h @@ -66,6 +66,8 @@ typedef struct { gen_helper_gvec_2 *fno; /* The opcode, if any, to which this corresponds. */ TCGOpcode opc; + /* The data argument to the out-of-line helper. */ + uint32_t data; /* The vector element size, if applicable. */ uint8_t vece; /* Prefer i64 to v64. */ @@ -83,6 +85,8 @@ typedef struct { gen_helper_gvec_3 *fno; /* The opcode, if any, to which this corresponds. */ TCGOpcode opc; + /* The data argument to the out-of-line helper. */ + uint32_t data; /* The vector element size, if applicable. */ uint8_t vece; /* Prefer i64 to v64. */ @@ -133,6 +137,19 @@ void tcg_gen_gvec_dup16i(uint32_t dofs, uint32_t s, uint32_t m, uint16_t x); void tcg_gen_gvec_dup32i(uint32_t dofs, uint32_t s, uint32_t m, uint32_t x); void tcg_gen_gvec_dup64i(uint32_t dofs, uint32_t s, uint32_t m, uint64_t x); +void tcg_gen_gvec_zipl(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_ziph(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_uzpe(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_uzpo(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_trne(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_trno(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); + /* * 64-bit vector operations. Use these when the register has been allocated * with tcg_global_mem_new_i64, and so we cannot also address it via pointer. diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index 5f49785cb3..733e29b5f8 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -927,6 +927,12 @@ void tcg_gen_andc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_orc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a); void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a); +void tcg_gen_zipl_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_ziph_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_uzpe_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_uzpo_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_trne_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_trno_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_ld_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset); void tcg_gen_st_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset); diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index b4e16cfbc3..c911d62442 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -229,6 +229,13 @@ DEF(andc_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_andc_vec)) DEF(orc_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_orc_vec)) DEF(not_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_not_vec)) +DEF(zipl_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_zip_vec)) +DEF(ziph_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_zip_vec)) +DEF(uzpe_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_uzp_vec)) +DEF(uzpo_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_uzp_vec)) +DEF(trne_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_trn_vec)) +DEF(trno_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_trn_vec)) + DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT) #if TCG_TARGET_MAYBE_vec diff --git a/tcg/tcg.h b/tcg/tcg.h index 0e9d79bdd4..d0678ac769 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -178,6 +178,9 @@ typedef uint64_t TCGRegSet; #define TCG_TARGET_HAS_not_vec 0 #define TCG_TARGET_HAS_andc_vec 0 #define TCG_TARGET_HAS_orc_vec 0 +#define TCG_TARGET_HAS_zip_vec 0 +#define TCG_TARGET_HAS_uzp_vec 0 +#define TCG_TARGET_HAS_trn_vec 0 #else #define TCG_TARGET_MAYBE_vec 1 #endif diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c index cd1ce12b7e..628df811b2 100644 --- a/accel/tcg/tcg-runtime-gvec.c +++ b/accel/tcg/tcg-runtime-gvec.c @@ -293,3 +293,81 @@ void HELPER(gvec_orc)(void *d, void *a, void *b, uint32_t desc) } clear_high(d, oprsz, desc); } + +/* The size of the alloca in the following is currently bounded to 2k. */ + +#define DO_ZIP(NAME, TYPE) \ +void HELPER(NAME)(void *d, void *a, void *b, uint32_t desc) \ +{ \ + intptr_t oprsz = simd_oprsz(desc); \ + intptr_t oprsz_2 = oprsz / 2; \ + intptr_t i; \ + /* We produce output faster than we consume input. \ + Therefore we must be mindful of possible overlap. */ \ + if (unlikely((a - d) < (uintptr_t)oprsz)) { \ + void *a_new = alloca(oprsz_2); \ + memcpy(a_new, a, oprsz_2); \ + a = a_new; \ + } \ + if (unlikely((b - d) < (uintptr_t)oprsz)) { \ + void *b_new = alloca(oprsz_2); \ + memcpy(b_new, b, oprsz_2); \ + b = b_new; \ + } \ + for (i = 0; i < oprsz_2; i += sizeof(TYPE)) { \ + *(TYPE *)(d + 2 * i + 0) = *(TYPE *)(a + i); \ + *(TYPE *)(d + 2 * i + sizeof(TYPE)) = *(TYPE *)(b + i); \ + } \ + clear_high(d, oprsz, desc); \ +} + +DO_ZIP(gvec_zip8, uint8_t) +DO_ZIP(gvec_zip16, uint16_t) +DO_ZIP(gvec_zip32, uint32_t) +DO_ZIP(gvec_zip64, uint64_t) + +#define DO_UZP(NAME, TYPE) \ +void HELPER(NAME)(void *d, void *a, void *b, uint32_t desc) \ +{ \ + intptr_t oprsz = simd_oprsz(desc); \ + intptr_t oprsz_2 = oprsz / 2; \ + intptr_t odd_ofs = simd_data(desc); \ + intptr_t i; \ + if (unlikely((b - d) < (uintptr_t)oprsz)) { \ + void *b_new = alloca(oprsz); \ + memcpy(b_new, b, oprsz); \ + b = b_new; \ + } \ + for (i = 0; i < oprsz_2; i += sizeof(TYPE)) { \ + *(TYPE *)(d + i) = *(TYPE *)(a + 2 * i + odd_ofs); \ + } \ + for (i = 0; i < oprsz_2; i += sizeof(TYPE)) { \ + *(TYPE *)(d + oprsz_2 + i) = *(TYPE *)(b + 2 * i + odd_ofs); \ + } \ + clear_high(d, oprsz, desc); \ +} + +DO_UZP(gvec_uzp8, uint8_t) +DO_UZP(gvec_uzp16, uint16_t) +DO_UZP(gvec_uzp32, uint32_t) +DO_UZP(gvec_uzp64, uint64_t) + +#define DO_TRN(NAME, TYPE) \ +void HELPER(NAME)(void *d, void *a, void *b, uint32_t desc) \ +{ \ + intptr_t oprsz = simd_oprsz(desc); \ + intptr_t odd_ofs = simd_data(desc); \ + intptr_t i; \ + for (i = 0; i < oprsz; i += 2 * sizeof(TYPE)) { \ + TYPE ae = *(TYPE *)(a + i + odd_ofs); \ + TYPE be = *(TYPE *)(b + i + odd_ofs); \ + *(TYPE *)(d + i + 0) = ae; \ + *(TYPE *)(d + i + sizeof(TYPE)) = be; \ + } \ + clear_high(d, oprsz, desc); \ +} + +DO_TRN(gvec_trn8, uint8_t) +DO_TRN(gvec_trn16, uint16_t) +DO_TRN(gvec_trn32, uint32_t) +DO_TRN(gvec_trn64, uint64_t) diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index 120e301096..6e2e84c210 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -548,7 +548,8 @@ void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, use a cl-sized store to implement the clearing without an extra store operation. This is true for aarch64 and x86_64 hosts. */ - if (TCG_TARGET_HAS_v256 && check_size_impl(oprsz, 32)) { + if (TCG_TARGET_HAS_v256 && check_size_impl(oprsz, 32) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V256, g->vece))) { uint32_t done = QEMU_ALIGN_DOWN(oprsz, 32); expand_2_vec(g->vece, dofs, aofs, done, 32, TCG_TYPE_V256, g->fniv); dofs += done; @@ -557,7 +558,8 @@ void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, maxsz -= done; } - if (TCG_TARGET_HAS_v128 && check_size_impl(oprsz, 16)) { + if (TCG_TARGET_HAS_v128 && check_size_impl(oprsz, 16) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V128, g->vece))) { uint32_t done = QEMU_ALIGN_DOWN(oprsz, 16); expand_2_vec(g->vece, dofs, aofs, done, 16, TCG_TYPE_V128, g->fniv); dofs += done; @@ -568,7 +570,9 @@ void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, if (check_size_impl(oprsz, 8)) { uint32_t done = QEMU_ALIGN_DOWN(oprsz, 8); - if (TCG_TARGET_HAS_v64 && !g->prefer_i64) { + if (TCG_TARGET_HAS_v64 && !g->prefer_i64 + && (!g->opc + || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V64, g->vece))) { expand_2_vec(g->vece, dofs, aofs, done, 8, TCG_TYPE_V64, g->fniv); } else if (g->fni8) { expand_2_i64(dofs, aofs, done, g->fni8); @@ -598,7 +602,7 @@ void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, } do_ool: - tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, 0, g->fno); + tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, g->data, g->fno); } /* Expand a vector three-operand operation. */ @@ -621,7 +625,8 @@ void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, use a cl-sized store to implement the clearing without an extra store operation. This is true for aarch64 and x86_64 hosts. */ - if (TCG_TARGET_HAS_v256 && check_size_impl(oprsz, 32)) { + if (TCG_TARGET_HAS_v256 && check_size_impl(oprsz, 32) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V256, g->vece))) { uint32_t done = QEMU_ALIGN_DOWN(oprsz, 32); expand_3_vec(g->vece, dofs, aofs, bofs, done, 32, TCG_TYPE_V256, g->load_dest, g->fniv); @@ -632,7 +637,8 @@ void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, maxsz -= done; } - if (TCG_TARGET_HAS_v128 && check_size_impl(oprsz, 16)) { + if (TCG_TARGET_HAS_v128 && check_size_impl(oprsz, 16) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V128, g->vece))) { uint32_t done = QEMU_ALIGN_DOWN(oprsz, 16); expand_3_vec(g->vece, dofs, aofs, bofs, done, 16, TCG_TYPE_V128, g->load_dest, g->fniv); @@ -645,7 +651,9 @@ void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, if (check_size_impl(oprsz, 8)) { uint32_t done = QEMU_ALIGN_DOWN(oprsz, 8); - if (TCG_TARGET_HAS_v64 && !g->prefer_i64) { + if (TCG_TARGET_HAS_v64 && !g->prefer_i64 + && (!g->opc + || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V64, g->vece))) { expand_3_vec(g->vece, dofs, aofs, bofs, done, 8, TCG_TYPE_V64, g->load_dest, g->fniv); } else if (g->fni8) { @@ -678,7 +686,7 @@ void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, } do_ool: - tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, 0, g->fno); + tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, g->data, g->fno); } /* @@ -1097,3 +1105,316 @@ void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, uint32_t aofs, }; tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g); } + +static void do_zip(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz, + bool high) +{ + static gen_helper_gvec_3 * const zip_fns[4] = { + gen_helper_gvec_zip8, + gen_helper_gvec_zip16, + gen_helper_gvec_zip32, + gen_helper_gvec_zip64, + }; + + TCGType type; + uint32_t step, i, n; + TCGOpcode zip_op; + + check_size_align(oprsz, maxsz, dofs | aofs | bofs); + check_overlap_3(dofs, aofs, bofs, oprsz); + tcg_debug_assert(vece <= MO_64); + + /* Quick check for sizes we won't support inline. */ + if (oprsz > 4 * 32 || maxsz > MAX_UNROLL * 32) { + goto do_ool; + } + + zip_op = high ? INDEX_op_ziph_vec : INDEX_op_zipl_vec; + + /* Since these operations don't operate in lock-step lanes, + we must care for overlap. */ + if (TCG_TARGET_HAS_v256 && oprsz % 32 == 0 && oprsz / 32 <= 8 + && tcg_can_emit_vec_op(zip_op, TCG_TYPE_V256, vece)) { + type = TCG_TYPE_V256; + step = 32; + n = oprsz / 32; + } else if (TCG_TARGET_HAS_v128 && oprsz % 16 == 0 && oprsz / 16 <= 8 + && tcg_can_emit_vec_op(zip_op, TCG_TYPE_V128, vece)) { + type = TCG_TYPE_V128; + step = 16; + n = oprsz / 16; + } else if (TCG_TARGET_HAS_v64 && oprsz % 8 == 0 && oprsz / 8 <= 8 + && tcg_can_emit_vec_op(zip_op, TCG_TYPE_V64, vece)) { + type = TCG_TYPE_V64; + step = 8; + n = oprsz / 8; + } else { + goto do_ool; + } + + if (n == 1) { + TCGv_vec t1 = tcg_temp_new_vec(type); + TCGv_vec t2 = tcg_temp_new_vec(type); + + tcg_gen_ld_vec(t1, cpu_env, aofs); + tcg_gen_ld_vec(t2, cpu_env, bofs); + if (high) { + tcg_gen_ziph_vec(vece, t1, t1, t2); + } else { + tcg_gen_zipl_vec(vece, t1, t1, t2); + } + tcg_gen_st_vec(t1, cpu_env, dofs); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + } else { + TCGv_vec ta[4], tb[4], tmp; + + if (high) { + aofs += oprsz / 2; + bofs += oprsz / 2; + } + + for (i = 0; i < (n / 2 + n % 2); ++i) { + ta[i] = tcg_temp_new_vec(type); + tb[i] = tcg_temp_new_vec(type); + tcg_gen_ld_vec(ta[i], cpu_env, aofs + i * step); + tcg_gen_ld_vec(tb[i], cpu_env, bofs + i * step); + } + + tmp = tcg_temp_new_vec(type); + for (i = 0; i < n; ++i) { + if (i & 1) { + tcg_gen_ziph_vec(vece, tmp, ta[i / 2], tb[i / 2]); + } else { + tcg_gen_zipl_vec(vece, tmp, ta[i / 2], tb[i / 2]); + } + tcg_gen_st_vec(tmp, cpu_env, dofs + i * step); + } + tcg_temp_free_vec(tmp); + + for (i = 0; i < (n / 2 + n % 2); ++i) { + tcg_temp_free_vec(ta[i]); + tcg_temp_free_vec(tb[i]); + } + } + if (oprsz < maxsz) { + expand_clr(dofs + oprsz, maxsz - oprsz); + } + return; + + do_ool: + if (high) { + aofs += oprsz / 2; + bofs += oprsz / 2; + } + tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, 0, zip_fns[vece]); +} + +void tcg_gen_gvec_zipl(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz) +{ + do_zip(vece, dofs, aofs, bofs, oprsz, maxsz, false); +} + +void tcg_gen_gvec_ziph(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz) +{ + do_zip(vece, dofs, aofs, bofs, oprsz, maxsz, true); +} + +static void do_uzp(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz, bool odd) +{ + static gen_helper_gvec_3 * const uzp_fns[4] = { + gen_helper_gvec_uzp8, + gen_helper_gvec_uzp16, + gen_helper_gvec_uzp32, + gen_helper_gvec_uzp64, + }; + + TCGType type; + uint32_t step, i, n; + TCGv_vec t[8]; + TCGOpcode uzp_op; + + check_size_align(oprsz, maxsz, dofs | aofs | bofs); + check_overlap_3(dofs, aofs, bofs, oprsz); + tcg_debug_assert(vece <= MO_64); + + /* Quick check for sizes we won't support inline. */ + if (oprsz > 4 * 32 || maxsz > MAX_UNROLL * 32) { + goto do_ool; + } + + uzp_op = odd ? INDEX_op_uzpo_vec : INDEX_op_uzpe_vec; + + /* Since these operations don't operate in lock-step lanes, + we must care for overlap. */ + if (TCG_TARGET_HAS_v256 && oprsz % 32 == 0 && oprsz / 32 <= 4 + && tcg_can_emit_vec_op(uzp_op, TCG_TYPE_V256, vece)) { + type = TCG_TYPE_V256; + step = 32; + n = oprsz / 32; + } else if (TCG_TARGET_HAS_v128 && oprsz % 16 == 0 && oprsz / 16 <= 4 + && tcg_can_emit_vec_op(uzp_op, TCG_TYPE_V128, vece)) { + type = TCG_TYPE_V128; + step = 16; + n = oprsz / 16; + } else if (TCG_TARGET_HAS_v64 && oprsz % 8 == 0 && oprsz / 8 <= 4 + && tcg_can_emit_vec_op(uzp_op, TCG_TYPE_V64, vece)) { + type = TCG_TYPE_V64; + step = 8; + n = oprsz / 8; + } else { + goto do_ool; + } + + for (i = 0; i < n; ++i) { + t[i] = tcg_temp_new_vec(type); + tcg_gen_ld_vec(t[i], cpu_env, aofs + i * step); + } + for (i = 0; i < n; ++i) { + t[n + i] = tcg_temp_new_vec(type); + tcg_gen_ld_vec(t[n + i], cpu_env, bofs + i * step); + } + for (i = 0; i < n; ++i) { + if (odd) { + tcg_gen_uzpo_vec(vece, t[2 * i], t[2 * i], t[2 * i + 1]); + } else { + tcg_gen_uzpe_vec(vece, t[2 * i], t[2 * i], t[2 * i + 1]); + } + tcg_gen_st_vec(t[2 * i], cpu_env, dofs + i * step); + tcg_temp_free_vec(t[2 * i]); + tcg_temp_free_vec(t[2 * i + 1]); + } + if (oprsz < maxsz) { + expand_clr(dofs + oprsz, maxsz - oprsz); + } + return; + + do_ool: + tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, + (1 << vece) * odd, uzp_fns[vece]); +} + +void tcg_gen_gvec_uzpe(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz) +{ + do_uzp(vece, dofs, aofs, bofs, oprsz, maxsz, false); +} + +void tcg_gen_gvec_uzpo(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz) +{ + do_uzp(vece, dofs, aofs, bofs, oprsz, maxsz, true); +} + +static void gen_trne8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + uint64_t m = 0x00ff00ff00ff00ffull; + tcg_gen_andi_i64(a, a, m); + tcg_gen_andi_i64(b, b, m); + tcg_gen_shli_i64(b, b, 8); + tcg_gen_or_i64(d, a, b); +} + +static void gen_trne16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + uint64_t m = 0x0000ffff0000ffffull; + tcg_gen_andi_i64(a, a, m); + tcg_gen_andi_i64(b, b, m); + tcg_gen_shli_i64(b, b, 16); + tcg_gen_or_i64(d, a, b); +} + +static void gen_trne32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + tcg_gen_deposit_i64(d, a, b, 32, 32); +} + +void tcg_gen_gvec_trne(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen3 g[4] = { + { .fni8 = gen_trne8_i64, + .fniv = tcg_gen_trne_vec, + .fno = gen_helper_gvec_trn8, + .opc = INDEX_op_trne_vec, + .vece = MO_8 }, + { .fni8 = gen_trne16_i64, + .fniv = tcg_gen_trne_vec, + .fno = gen_helper_gvec_trn16, + .opc = INDEX_op_trne_vec, + .vece = MO_16 }, + { .fni8 = gen_trne32_i64, + .fniv = tcg_gen_trne_vec, + .fno = gen_helper_gvec_trn32, + .opc = INDEX_op_trne_vec, + .vece = MO_32 }, + { .fniv = tcg_gen_trne_vec, + .fno = gen_helper_gvec_trn64, + .opc = INDEX_op_trne_vec, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g[vece]); +} + +static void gen_trno8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + uint64_t m = 0xff00ff00ff00ff00ull; + tcg_gen_andi_i64(a, a, m); + tcg_gen_andi_i64(b, b, m); + tcg_gen_shri_i64(a, a, 8); + tcg_gen_or_i64(d, a, b); +} + +static void gen_trno16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + uint64_t m = 0xffff0000ffff0000ull; + tcg_gen_andi_i64(a, a, m); + tcg_gen_andi_i64(b, b, m); + tcg_gen_shri_i64(a, a, 16); + tcg_gen_or_i64(d, a, b); +} + +static void gen_trno32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + tcg_gen_shri_i64(a, a, 32); + tcg_gen_deposit_i64(d, b, a, 0, 32); +} + +void tcg_gen_gvec_trno(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen3 g[4] = { + { .fni8 = gen_trno8_i64, + .fniv = tcg_gen_trno_vec, + .fno = gen_helper_gvec_trn8, + .opc = INDEX_op_trno_vec, + .data = 1, + .vece = MO_8 }, + { .fni8 = gen_trno16_i64, + .fniv = tcg_gen_trno_vec, + .fno = gen_helper_gvec_trn16, + .opc = INDEX_op_trno_vec, + .data = 2, + .vece = MO_16 }, + { .fni8 = gen_trno32_i64, + .fniv = tcg_gen_trno_vec, + .fno = gen_helper_gvec_trn32, + .opc = INDEX_op_trno_vec, + .data = 4, + .vece = MO_32 }, + { .fniv = tcg_gen_trno_vec, + .fno = gen_helper_gvec_trn64, + .opc = INDEX_op_trno_vec, + .data = 8, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g[vece]); +} diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c index 5cfe4af6bd..a5d0ff89c3 100644 --- a/tcg/tcg-op-vec.c +++ b/tcg/tcg-op-vec.c @@ -380,3 +380,58 @@ void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a) tcg_temp_free_vec(t); } } + +static void do_interleave(TCGOpcode opc, unsigned vece, + TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + TCGTemp *rt = tcgv_vec_temp(r); + TCGTemp *at = tcgv_vec_temp(a); + TCGTemp *bt = tcgv_vec_temp(b); + TCGArg ri = temp_arg(rt); + TCGArg ai = temp_arg(at); + TCGArg bi = temp_arg(bt); + TCGType type = rt->base_type; + unsigned vecl = type - TCG_TYPE_V64; + int can; + + tcg_debug_assert(at->base_type == type); + tcg_debug_assert(bt->base_type == type); + tcg_debug_assert((8 << vece) <= (32 << vecl)); + can = tcg_can_emit_vec_op(opc, type, vece); + if (can > 0) { + vec_gen_3(opc, type, vece, ri, ai, bi); + } else { + tcg_debug_assert(can < 0); + tcg_expand_vec_op(opc, type, vece, ri, ai, bi); + } +} + +void tcg_gen_zipl_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + do_interleave(INDEX_op_zipl_vec, vece, r, a, b); +} + +void tcg_gen_ziph_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + do_interleave(INDEX_op_ziph_vec, vece, r, a, b); +} + +void tcg_gen_uzpe_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + do_interleave(INDEX_op_uzpe_vec, vece, r, a, b); +} + +void tcg_gen_uzpo_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + do_interleave(INDEX_op_uzpo_vec, vece, r, a, b); +} + +void tcg_gen_trne_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + do_interleave(INDEX_op_trne_vec, vece, r, a, b); +} + +void tcg_gen_trno_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + do_interleave(INDEX_op_trno_vec, vece, r, a, b); +} diff --git a/tcg/tcg.c b/tcg/tcg.c index b4f8938fb0..33e3c03cbc 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1403,6 +1403,15 @@ bool tcg_op_supported(TCGOpcode op) return have_vec && TCG_TARGET_HAS_andc_vec; case INDEX_op_orc_vec: return have_vec && TCG_TARGET_HAS_orc_vec; + case INDEX_op_zipl_vec: + case INDEX_op_ziph_vec: + return have_vec && TCG_TARGET_HAS_zip_vec; + case INDEX_op_uzpe_vec: + case INDEX_op_uzpo_vec: + return have_vec && TCG_TARGET_HAS_uzp_vec; + case INDEX_op_trne_vec: + case INDEX_op_trno_vec: + return have_vec && TCG_TARGET_HAS_trn_vec; default: tcg_debug_assert(op > INDEX_op_last_generic && op < NB_OPS); diff --git a/tcg/README b/tcg/README index e14990fb9b..8ab8d3ab7e 100644 --- a/tcg/README +++ b/tcg/README @@ -561,6 +561,46 @@ E.g. VECL=1 -> 64 << 1 -> v128, and VECE=2 -> 1 << 2 -> i32. Similarly, logical operations with and without compliment. Note that VECE is unused. +* zipl_vec v0, v1, v2 +* ziph_vec v0, v1, v2 + + "Zip" two vectors together, either the low half of v1/v2 or the high half. + The name comes from ARM ARM; the equivalent function in Intel terminology + is the less scrutable "punpck". The effect is + + part = ("high" ? VECL/VECE/2 : 0); + for (i = 0; i < VECL/VECE/2; ++i) { + v0[2i + 0] = v1[i + part]; + v0[2i + 1] = v2[i + part]; + } + +* uzpe_vec v0, v1, v2 +* uzpo_vec v0, v1, v2 + + "Unzip" two vectors, either the even elements or the odd elements. + If v1 and v2 are the result of zipl and ziph, this performs the + inverse operation. The effect is + + part = ("odd" ? 1 : 0) + for (i = 0; i < VECL/VECE/2; ++i) { + v0[i] = v1[2i + part]; + } + for (i = 0; i < VECL/VECE/2; ++i) { + v0[i + VECL/VECE/2] = v1[2i + part]; + } + +* trne_vec v0, v1, v2 +* trno_vec v1, v1, v2 + + "Transpose" two vectors, either the even elements or the odd elements. + The effect is + + part = ("odd" ? 1 : 0) + for (i = 0; i < VECL/VECE/2; ++i) { + v0[2i + 0] = v1[2i + part]; + v0[2i + 1] = v2[2i + part]; + } + ********* Note 1: Some shortcuts are defined when the last operand is known to be From patchwork Mon Dec 18 17:17:43 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122247 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3105775qgn; Mon, 18 Dec 2017 09:24:40 -0800 (PST) X-Google-Smtp-Source: ACJfBosUaGD3s6YExOCP9ppGYvsakuuMQWT+R7ZVhrzDrGK9Www7xS+WZmCpY7m+4NLAM/zBNv6O X-Received: by 10.13.199.130 with SMTP id j124mr417594ywd.44.1513617880040; Mon, 18 Dec 2017 09:24:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513617880; cv=none; d=google.com; s=arc-20160816; b=wM/gj9ISFefnNkpSXh/fenNs8YlFKndTa7Ad0Bt6D3gozzrm/Ou2Y6Tw7YLE4BOT2l 1/kMU4L9bhBOuYd22ToZVeGB8OKidsPZoLjnI2DJ4fbWliPLLPazFxEYZaPPtMEmpaOi IEDiWulFTKWrV2JmLt5bI+R/H5LT0+w/JcjIo7OucT0vkU0f6tflrVAuv56drG/xQhDR 0w3c8/vBX58XeUli+BYc82nWcr5LKmX6Zn3OKAuVk8eqPZbarYf1hsVEd1oZ2diMtcPh ge7Y57IPaPBZa59rJx5kevDXqKuO0qWq+AADGyTYpOoUMhKPYMELjlMgkP0oP4go3DYK FKUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=fglbqt0ZZi+UTUzzw2VU5fDhQFOJREmbY828m0tqWIs=; b=AcEO1TH5b630HhRV8j93labuJPCrrGXwOHrxRvuCjy1yjc4WKpUQbVoNDrZQZa3sJB 20EOAiFt4rS5mtmxBlff+B368J4twV+hlOzsJyPtRvfHcuz0plLkQ/B8vi/1YWvharF3 BmbZWtO+NIdidDRXEk7NlD7Mngii81uUj/czx6psnzUTdMsGKaqfpdC/2f0eEgG35lAq oS6XddB7yQxSsot9PdlwNe8E+HEapLGuzzKP7VN9/nHu8l5Qb3ulzidf1E02rUOS2u6F mL9hLI/2cAQYrbHEwe0SoBymqKb8HSyfeb9uVjVzkhNyIGqVUIwFMlIfv1XRtwH5bdo4 vk8w== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=VdceXt5b; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id 128si2556097ybw.612.2017.12.18.09.24.39 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:24:40 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=VdceXt5b; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58558 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz9b-0005aB-Fm for patch@linaro.org; Mon, 18 Dec 2017 12:24:39 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37198) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3a-0000TX-IZ for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:31 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3T-00015Q-DL for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:26 -0500 Received: from mail-pg0-x241.google.com ([2607:f8b0:400e:c05::241]:36822) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3T-00013s-5S for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:19 -0500 Received: by mail-pg0-x241.google.com with SMTP id k134so9402740pga.3 for ; Mon, 18 Dec 2017 09:18:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=fglbqt0ZZi+UTUzzw2VU5fDhQFOJREmbY828m0tqWIs=; b=VdceXt5bRZDs39SGuw4cgQfOkNHMnDVX4RUXOsKVYQXJ/75ZfvmRQ5TWAZaWnnvi+a eSnlBUOVIJZhx+Kpv41QPgzyiocWu/HVBojfQTeijhIj6ULw9KkOFFZwpfN23kH9Zacg wFb7AWEgQs+KA26+OyB0vOapCN9PJWRHpMfYU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=fglbqt0ZZi+UTUzzw2VU5fDhQFOJREmbY828m0tqWIs=; b=Pvuhe/ss1Pq1w39ygeRn7WLZRIGhAfE5Cpy2GyyCDdG94RPaIA0oL8zd+0y3oYxytX 4SiqtaxiDXunqzyN+BXY81SS5tyfJU9wc+wTvSyKLdVBQC4tsuYKPTLTxxEWTpt9yL5X vOquoqLbd4D1GodIeVMJfjfOqn26vhHpR+iHaIruoZj/v7u94WXQW2pLkft4tp+B1GNZ QkowztNou5Ci+flNoaSFeBijoZV3V9WifdApejIG40Q1PEsKYujt6rhx/4IbQbK3dMCL kaCKH/XSUKNf+pZCXltEBxRUIocwSBbCtew6rw9KOQHfxceEOKjS6RQ+nyAbuqVyAu8E 2EUQ== X-Gm-Message-State: AKGB3mIMFhBlA+GLuZobw48lULsSP7ppVLSdAODLtlRpXVdGxBo6ANwq OD+PvZ6UFQXYMDRKNRQcS9bE/zaAu1g= X-Received: by 10.98.89.4 with SMTP id n4mr390119pfb.133.1513617497835; Mon, 18 Dec 2017 09:18:17 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.16 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:16 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:43 -0800 Message-Id: <20171218171758.16964-12-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::241 Subject: [Qemu-devel] [PATCH v7 11/26] target/arm: Use vector infrastructure for aa64 zip/uzp/trn/xtn X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- target/arm/translate-a64.c | 103 +++++++++++++++------------------------------ 1 file changed, 35 insertions(+), 68 deletions(-) -- 2.14.3 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 55a4902fc2..8769b4505a 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -5576,11 +5576,7 @@ static void disas_simd_zip_trn(DisasContext *s, uint32_t insn) int opcode = extract32(insn, 12, 2); bool part = extract32(insn, 14, 1); bool is_q = extract32(insn, 30, 1); - int esize = 8 << size; - int i, ofs; - int datasize = is_q ? 128 : 64; - int elements = datasize / esize; - TCGv_i64 tcg_res, tcg_resl, tcg_resh; + GVecGen3Fn *gvec_fn; if (opcode == 0 || (size == 3 && !is_q)) { unallocated_encoding(s); @@ -5591,60 +5587,24 @@ static void disas_simd_zip_trn(DisasContext *s, uint32_t insn) return; } - tcg_resl = tcg_const_i64(0); - tcg_resh = tcg_const_i64(0); - tcg_res = tcg_temp_new_i64(); - - for (i = 0; i < elements; i++) { - switch (opcode) { - case 1: /* UZP1/2 */ - { - int midpoint = elements / 2; - if (i < midpoint) { - read_vec_element(s, tcg_res, rn, 2 * i + part, size); - } else { - read_vec_element(s, tcg_res, rm, - 2 * (i - midpoint) + part, size); - } - break; - } - case 2: /* TRN1/2 */ - if (i & 1) { - read_vec_element(s, tcg_res, rm, (i & ~1) + part, size); - } else { - read_vec_element(s, tcg_res, rn, (i & ~1) + part, size); - } - break; - case 3: /* ZIP1/2 */ - { - int base = part * elements / 2; - if (i & 1) { - read_vec_element(s, tcg_res, rm, base + (i >> 1), size); - } else { - read_vec_element(s, tcg_res, rn, base + (i >> 1), size); - } - break; - } - default: - g_assert_not_reached(); - } - - ofs = i * esize; - if (ofs < 64) { - tcg_gen_shli_i64(tcg_res, tcg_res, ofs); - tcg_gen_or_i64(tcg_resl, tcg_resl, tcg_res); - } else { - tcg_gen_shli_i64(tcg_res, tcg_res, ofs - 64); - tcg_gen_or_i64(tcg_resh, tcg_resh, tcg_res); - } + switch (opcode) { + case 1: /* UZP1/2 */ + gvec_fn = part ? tcg_gen_gvec_uzpo : tcg_gen_gvec_uzpe; + break; + case 2: /* TRN1/2 */ + gvec_fn = part ? tcg_gen_gvec_trno : tcg_gen_gvec_trne; + break; + case 3: /* ZIP1/2 */ + gvec_fn = part ? tcg_gen_gvec_ziph : tcg_gen_gvec_zipl; + break; + default: + g_assert_not_reached(); } - tcg_temp_free_i64(tcg_res); - - write_vec_element(s, tcg_resl, rd, 0, MO_64); - tcg_temp_free_i64(tcg_resl); - write_vec_element(s, tcg_resh, rd, 1, MO_64); - tcg_temp_free_i64(tcg_resh); + gvec_fn(size, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s)); } static void do_minmaxop(DisasContext *s, TCGv_i32 tcg_elt1, TCGv_i32 tcg_elt2, @@ -7922,6 +7882,22 @@ static void handle_2misc_narrow(DisasContext *s, bool scalar, int destelt = is_q ? 2 : 0; int passes = scalar ? 1 : 2; + if (opcode == 0x12 && !u) { /* XTN, XTN2 */ + tcg_debug_assert(!scalar); + if (is_q) { /* XTN2 */ + tcg_gen_gvec_uzpe(size, vec_reg_offset(s, rd, 1, MO_64), + vec_reg_offset(s, rn, 0, MO_64), + vec_reg_offset(s, rn, 1, MO_64), + 8, vec_full_reg_size(s) - 8); + } else { + tcg_gen_gvec_uzpe(size, vec_reg_offset(s, rd, 0, MO_64), + vec_reg_offset(s, rn, 0, MO_64), + vec_reg_offset(s, rn, 1, MO_64), + 8, vec_full_reg_size(s)); + } + return; + } + if (scalar) { tcg_res[1] = tcg_const_i32(0); } @@ -7939,23 +7915,14 @@ static void handle_2misc_narrow(DisasContext *s, bool scalar, tcg_res[pass] = tcg_temp_new_i32(); switch (opcode) { - case 0x12: /* XTN, SQXTUN */ + case 0x12: /* , SQXTUN */ { - static NeonGenNarrowFn * const xtnfns[3] = { - gen_helper_neon_narrow_u8, - gen_helper_neon_narrow_u16, - tcg_gen_extrl_i64_i32, - }; static NeonGenNarrowEnvFn * const sqxtunfns[3] = { gen_helper_neon_unarrow_sat8, gen_helper_neon_unarrow_sat16, gen_helper_neon_unarrow_sat32, }; - if (u) { - genenvfn = sqxtunfns[size]; - } else { - genfn = xtnfns[size]; - } + genenvfn = sqxtunfns[size]; break; } case 0x14: /* SQXTN, UQXTN */ From patchwork Mon Dec 18 17:17:44 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122254 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3112264qgn; Mon, 18 Dec 2017 09:31:09 -0800 (PST) X-Google-Smtp-Source: ACJfBouehBQ20+vLLGZBcgF+adGgvL7SK7pbo5alwrmynu58/lVUvdMviR3MIAICEvPh4nEbwZIp X-Received: by 10.37.61.197 with SMTP id k188mr403404yba.493.1513618269693; Mon, 18 Dec 2017 09:31:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513618269; cv=none; d=google.com; s=arc-20160816; b=lLTBNTiGR4rQuDRrf0Uncvmw5ODabnKqOKzXP4eJNf5/AgWX+ShT6Yt7o2KeO2ZHr5 nYujQFV9gzeZDM8eTT/clWSf9sefYdECWznboE/P5U2/YoacbJwR22+GZbP1Ntuqirtp vot11afPTXNIvlAcw63STR0UapA/8mwdQdzRnS5y1gL3AWQb7vdeWc9cjW33WDGcWQ2t KvklgnSkVYHPQAJI4er0FVYxQFm8HkkwFPXR59Yc1iZ/va+Ys9661v7QHQv+rq5DGia5 zfyxw3oRTIOsCV9pcM9DRsV01bvY8UrPecnCym+QczSS39RWkhxNmE3w4cTS3XfnvMsN +jgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=MghJsyUHsWbY63TzbKt+UEpfAA9auzZ2jxUMpkHiIg8=; b=fib/wQFXZp+61g6DfWQpNHGNkyGj6WbAFdixPTBNM+ZNTqfa9ttnXcjFkhYc3wKyfk CC5K0dlgkzBbsuAmeINSzTAx88x0oqyz3cJThr3DYC9rtSbv+UYF/hiwgAtZZV/+DSGa 2OcXbVyw/2+7LOaJfvEzH41AYkrzNS3kmLa7E2s9N0xytWZxP3Q4rFLWF2viNJVUX9ya aUNuxuzi6aGUSZ1tuxd/cmzHk0nb8gt+Tq2O132rq7W8ejSDUXxUCAB79sNJx4OQ4Lzf rSbVbtTwh6v9v6/vGW2QRCPW7+ieHOa5deoB9+AyYDHHHS8nGtNEh4qKrE7v6sO9DHiS T4xA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=LlQ04CmQ; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id 206si1379882ywb.329.2017.12.18.09.31.09 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:31:09 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=LlQ04CmQ; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58594 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzFt-0001kP-3b for patch@linaro.org; Mon, 18 Dec 2017 12:31:09 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37307) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3e-0000Xm-1D for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:36 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3V-00018g-G7 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:29 -0500 Received: from mail-pf0-x243.google.com ([2607:f8b0:400e:c00::243]:41712) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3V-00017i-40 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:21 -0500 Received: by mail-pf0-x243.google.com with SMTP id j28so9930738pfk.8 for ; Mon, 18 Dec 2017 09:18:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=MghJsyUHsWbY63TzbKt+UEpfAA9auzZ2jxUMpkHiIg8=; b=LlQ04CmQ9rEfnf9Cj4uFLLAGO5d89dpyxv5aGD8zJ0ms56ZHjWX3EWrg4I6ZJnMwXn eVbWnZ8ZsURQ9OYmNrlUvtwwONDJGD4IUO78O2gz9kympWlR+FW4uPgpFzM+y5mkUWhb 0nEY7LC0/CdfN1UqS5QKTcgp3WG7mrz/BQLqY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=MghJsyUHsWbY63TzbKt+UEpfAA9auzZ2jxUMpkHiIg8=; b=MyYDnXohBnif8ajIdqJuq9hRnZTWIT/mjBQQvCWgt4ORuXMJLg2ejWdDx2UI5hsNfC Yy7YBDLDVhigiz9TT+mCEACCcE+VfEeQNAaupKHJczkgUwJ473CTBBiPUIdWSxkVFTrm 9CBaTAsYR5hBXwXeIiPckfpNUfb+4pnAbpxqK9KJLqC4P4AHr30rmxTBuNhs1NfymTx0 17KsBZI7dw/K1xu2UCeDaJQWXTEviyfgMMm+AJoE2vrVtbfCNgt6wRlBHPaTD4YfWjeE aXxNKZKzOeCT5OwnnskeqPg1PCudqEWZYZdI/uEFBj2hywvrKvkMCQfIZmeNBmem4eD1 lDMw== X-Gm-Message-State: AKGB3mKLr+4kql0hgvjskcl07uKnF7zUsUjjRoWUuakDiewWPan7lKQW 70ALgfoailmwzidLCz5taDJXDYKxQQM= X-Received: by 10.98.75.68 with SMTP id y65mr418977pfa.78.1513617499503; Mon, 18 Dec 2017 09:18:19 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.17 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:18 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:44 -0800 Message-Id: <20171218171758.16964-13-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::243 Subject: [Qemu-devel] [PATCH v7 12/26] tcg: Add generic vector ops for constant shifts X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Opcodes are added for scalar and vector shifts, but considering the varied semantics of these do not expose them to the front ends. Do go ahead and provide them in case they are needed for backend expansion. Signed-off-by: Richard Henderson --- accel/tcg/tcg-runtime.h | 15 +++ tcg/i386/tcg-target.h | 3 + tcg/tcg-op-gvec.h | 35 ++++++ tcg/tcg-op.h | 5 + tcg/tcg-opc.h | 12 ++ tcg/tcg.h | 3 + accel/tcg/tcg-runtime-gvec.c | 149 ++++++++++++++++++++++ tcg/tcg-op-gvec.c | 291 +++++++++++++++++++++++++++++++++++++++++++ tcg/tcg-op-vec.c | 40 ++++++ tcg/tcg.c | 12 ++ tcg/README | 29 +++++ 11 files changed, 594 insertions(+) -- 2.14.3 diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h index c6de749134..cb05a755b8 100644 --- a/accel/tcg/tcg-runtime.h +++ b/accel/tcg/tcg-runtime.h @@ -164,6 +164,21 @@ DEF_HELPER_FLAGS_4(gvec_xor, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_andc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_orc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_shl8i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_shl16i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_shl32i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_shl64i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(gvec_shr8i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_shr16i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_shr32i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_shr64i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(gvec_sar8i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_sar16i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_sar32i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_sar64i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(gvec_zip8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_zip16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_zip32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h index ff0ad7dcdb..92d533eb92 100644 --- a/tcg/i386/tcg-target.h +++ b/tcg/i386/tcg-target.h @@ -177,6 +177,9 @@ extern bool have_avx2; #define TCG_TARGET_HAS_orc_vec 0 #define TCG_TARGET_HAS_not_vec 0 #define TCG_TARGET_HAS_neg_vec 0 +#define TCG_TARGET_HAS_shi_vec 0 +#define TCG_TARGET_HAS_shs_vec 0 +#define TCG_TARGET_HAS_shv_vec 0 #define TCG_TARGET_HAS_zip_vec 0 #define TCG_TARGET_HAS_uzp_vec 0 #define TCG_TARGET_HAS_trn_vec 0 diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h index 64270a3c74..de2c0e669a 100644 --- a/tcg/tcg-op-gvec.h +++ b/tcg/tcg-op-gvec.h @@ -74,6 +74,25 @@ typedef struct { bool prefer_i64; } GVecGen2; +typedef struct { + /* Expand inline as a 64-bit or 32-bit integer. + Only one of these will be non-NULL. */ + void (*fni8)(TCGv_i64, TCGv_i64, unsigned); + void (*fni4)(TCGv_i32, TCGv_i32, unsigned); + /* Expand inline with a host vector type. */ + void (*fniv)(unsigned, TCGv_vec, TCGv_vec, unsigned); + /* Expand out-of-line helper w/descriptor. */ + gen_helper_gvec_2 *fno; + /* The opcode, if any, to which this corresponds. */ + TCGOpcode opc; + /* The vector element size, if applicable. */ + uint8_t vece; + /* Prefer i64 to v64. */ + bool prefer_i64; + /* Load dest as a 3rd source operand. */ + bool load_dest; +} GVecGen2i; + typedef struct { /* Expand inline as a 64-bit or 32-bit integer. Only one of these will be non-NULL. */ @@ -97,6 +116,8 @@ typedef struct { void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, uint32_t opsz, uint32_t clsz, const GVecGen2 *); +void tcg_gen_gvec_2i(uint32_t dofs, uint32_t aofs, uint32_t opsz, + uint32_t clsz, unsigned c, const GVecGen2i *); void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t opsz, uint32_t clsz, const GVecGen3 *); @@ -137,6 +158,13 @@ void tcg_gen_gvec_dup16i(uint32_t dofs, uint32_t s, uint32_t m, uint16_t x); void tcg_gen_gvec_dup32i(uint32_t dofs, uint32_t s, uint32_t m, uint32_t x); void tcg_gen_gvec_dup64i(uint32_t dofs, uint32_t s, uint32_t m, uint64_t x); +void tcg_gen_gvec_shli(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t clsz, unsigned shift); +void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t clsz, unsigned shift); +void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t clsz, unsigned shift); + void tcg_gen_gvec_zipl(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t opsz, uint32_t clsz); void tcg_gen_gvec_ziph(unsigned vece, uint32_t dofs, uint32_t aofs, @@ -167,3 +195,10 @@ void tcg_gen_vec_add32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); void tcg_gen_vec_sub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); void tcg_gen_vec_sub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); + +void tcg_gen_vec_shl8i_i64(TCGv_i64 d, TCGv_i64 a, unsigned); +void tcg_gen_vec_shl16i_i64(TCGv_i64 d, TCGv_i64 a, unsigned); +void tcg_gen_vec_shr8i_i64(TCGv_i64 d, TCGv_i64 a, unsigned); +void tcg_gen_vec_shr16i_i64(TCGv_i64 d, TCGv_i64 a, unsigned); +void tcg_gen_vec_sar8i_i64(TCGv_i64 d, TCGv_i64 a, unsigned); +void tcg_gen_vec_sar16i_i64(TCGv_i64 d, TCGv_i64 a, unsigned); diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index 733e29b5f8..83478ab006 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -927,6 +927,11 @@ void tcg_gen_andc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_orc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a); void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a); + +void tcg_gen_shli_vec(unsigned vece, TCGv_vec r, TCGv_vec a, unsigned i); +void tcg_gen_shri_vec(unsigned vece, TCGv_vec r, TCGv_vec a, unsigned i); +void tcg_gen_sari_vec(unsigned vece, TCGv_vec r, TCGv_vec a, unsigned i); + void tcg_gen_zipl_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_ziph_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_uzpe_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index c911d62442..a085fc077b 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -229,6 +229,18 @@ DEF(andc_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_andc_vec)) DEF(orc_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_orc_vec)) DEF(not_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_not_vec)) +DEF(shli_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_shi_vec)) +DEF(shri_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_shi_vec)) +DEF(sari_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_shi_vec)) + +DEF(shls_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shs_vec)) +DEF(shrs_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shs_vec)) +DEF(sars_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shs_vec)) + +DEF(shlv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec)) +DEF(shrv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec)) +DEF(sarv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec)) + DEF(zipl_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_zip_vec)) DEF(ziph_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_zip_vec)) DEF(uzpe_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_uzp_vec)) diff --git a/tcg/tcg.h b/tcg/tcg.h index d0678ac769..bdfe058c45 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -178,6 +178,9 @@ typedef uint64_t TCGRegSet; #define TCG_TARGET_HAS_not_vec 0 #define TCG_TARGET_HAS_andc_vec 0 #define TCG_TARGET_HAS_orc_vec 0 +#define TCG_TARGET_HAS_shi_vec 0 +#define TCG_TARGET_HAS_shs_vec 0 +#define TCG_TARGET_HAS_shv_vec 0 #define TCG_TARGET_HAS_zip_vec 0 #define TCG_TARGET_HAS_uzp_vec 0 #define TCG_TARGET_HAS_trn_vec 0 diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c index 628df811b2..fba62f1192 100644 --- a/accel/tcg/tcg-runtime-gvec.c +++ b/accel/tcg/tcg-runtime-gvec.c @@ -36,6 +36,11 @@ typedef uint16_t vec16 __attribute__((vector_size(16))); typedef uint32_t vec32 __attribute__((vector_size(16))); typedef uint64_t vec64 __attribute__((vector_size(16))); +typedef int8_t svec8 __attribute__((vector_size(16))); +typedef int16_t svec16 __attribute__((vector_size(16))); +typedef int32_t svec32 __attribute__((vector_size(16))); +typedef int64_t svec64 __attribute__((vector_size(16))); + static inline void clear_high(void *d, intptr_t oprsz, uint32_t desc) { intptr_t maxsz = simd_maxsz(desc); @@ -294,6 +299,150 @@ void HELPER(gvec_orc)(void *d, void *a, void *b, uint32_t desc) clear_high(d, oprsz, desc); } +void HELPER(gvec_shl8i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(vec8 *)(d + i) = *(vec8 *)(a + i) << shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_shl16i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(vec16 *)(d + i) = *(vec16 *)(a + i) << shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_shl32i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(vec32 *)(d + i) = *(vec32 *)(a + i) << shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_shl64i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) << shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_shr8i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(vec8 *)(d + i) = *(vec8 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_shr16i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(vec16 *)(d + i) = *(vec16 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_shr32i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(vec32 *)(d + i) = *(vec32 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_shr64i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sar8i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(svec8 *)(d + i) = *(svec8 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sar16i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(svec16 *)(d + i) = *(svec16 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sar32i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(svec32 *)(d + i) = *(svec32 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sar64i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(svec64 *)(d + i) = *(svec64 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + /* The size of the alloca in the following is currently bounded to 2k. */ #define DO_ZIP(NAME, TYPE) \ diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index 6e2e84c210..efdb009d2a 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -424,6 +424,26 @@ static void expand_2_i32(uint32_t dofs, uint32_t aofs, uint32_t opsz, tcg_temp_free_i32(t0); } +static void expand_2i_i32(uint32_t dofs, uint32_t aofs, uint32_t opsz, + unsigned c, bool load_dest, + void (*fni)(TCGv_i32, TCGv_i32, unsigned)) +{ + TCGv_i32 t0 = tcg_temp_new_i32(); + TCGv_i32 t1 = tcg_temp_new_i32(); + uint32_t i; + + for (i = 0; i < opsz; i += 4) { + tcg_gen_ld_i32(t0, cpu_env, aofs + i); + if (load_dest) { + tcg_gen_ld_i32(t1, cpu_env, dofs + i); + } + fni(t1, t0, c); + tcg_gen_st_i32(t1, cpu_env, dofs + i); + } + tcg_temp_free_i32(t0); + tcg_temp_free_i32(t1); +} + /* Expand OPSZ bytes worth of three-operand operations using i32 elements. */ static void expand_3_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t opsz, bool load_dest, @@ -463,6 +483,26 @@ static void expand_2_i64(uint32_t dofs, uint32_t aofs, uint32_t opsz, tcg_temp_free_i64(t0); } +static void expand_2i_i64(uint32_t dofs, uint32_t aofs, uint32_t opsz, + unsigned c, bool load_dest, + void (*fni)(TCGv_i64, TCGv_i64, unsigned)) +{ + TCGv_i64 t0 = tcg_temp_new_i64(); + TCGv_i64 t1 = tcg_temp_new_i64(); + uint32_t i; + + for (i = 0; i < opsz; i += 8) { + tcg_gen_ld_i64(t0, cpu_env, aofs + i); + if (load_dest) { + tcg_gen_ld_i64(t1, cpu_env, dofs + i); + } + fni(t1, t0, c); + tcg_gen_st_i64(t1, cpu_env, dofs + i); + } + tcg_temp_free_i64(t0); + tcg_temp_free_i64(t1); +} + /* Expand OPSZ bytes worth of three-operand operations using i64 elements. */ static void expand_3_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t opsz, bool load_dest, @@ -503,6 +543,29 @@ static void expand_2_vec(unsigned vece, uint32_t dofs, uint32_t aofs, tcg_temp_free_vec(t0); } +/* Expand OPSZ bytes worth of two-vector operands and an immediate operand + using host vectors. */ +static void expand_2i_vec(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t tysz, TCGType type, + unsigned c, bool load_dest, + void (*fni)(unsigned, TCGv_vec, TCGv_vec, unsigned)) +{ + TCGv_vec t0 = tcg_temp_new_vec(type); + TCGv_vec t1 = tcg_temp_new_vec(type); + uint32_t i; + + for (i = 0; i < opsz; i += tysz) { + tcg_gen_ld_vec(t0, cpu_env, aofs + i); + if (load_dest) { + tcg_gen_ld_vec(t1, cpu_env, dofs + i); + } + fni(vece, t1, t0, c); + tcg_gen_st_vec(t1, cpu_env, dofs + i); + } + tcg_temp_free_vec(t0); + tcg_temp_free_vec(t1); +} + /* Expand OPSZ bytes worth of three-operand operations using host vectors. */ static void expand_3_vec(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t opsz, @@ -605,6 +668,85 @@ void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, g->data, g->fno); } +void tcg_gen_gvec_2i(uint32_t dofs, uint32_t aofs, uint32_t oprsz, + uint32_t maxsz, unsigned c, const GVecGen2i *g) +{ + check_size_align(oprsz, maxsz, dofs | aofs); + check_overlap_2(dofs, aofs, maxsz); + + /* Quick check for sizes we won't support inline. */ + if (oprsz > MAX_UNROLL * 32 || maxsz > MAX_UNROLL * 32) { + goto do_ool; + } + + /* Recall that ARM SVE allows vector sizes that are not a power of 2. + Expand with successively smaller host vector sizes. The intent is + that e.g. oprsz == 80 would be expanded with 2x32 + 1x16. */ + /* ??? For maxsz > oprsz, the host may be able to use an op-sized + operation, zeroing the balance of the register. We can then + use a cl-sized store to implement the clearing without an extra + store operation. This is true for aarch64 and x86_64 hosts. */ + + if (TCG_TARGET_HAS_v256 && check_size_impl(oprsz, 32) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V256, g->vece))) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 32); + expand_2i_vec(g->vece, dofs, aofs, done, 32, TCG_TYPE_V256, + c, g->load_dest, g->fniv); + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (TCG_TARGET_HAS_v128 && check_size_impl(oprsz, 16) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V128, g->vece))) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 16); + expand_2i_vec(g->vece, dofs, aofs, done, 16, TCG_TYPE_V128, + c, g->load_dest, g->fniv); + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (check_size_impl(oprsz, 8)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 8); + if (TCG_TARGET_HAS_v64 && !g->prefer_i64 + && (!g->opc + || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V64, g->vece))) { + expand_2i_vec(g->vece, dofs, aofs, done, 8, TCG_TYPE_V64, + c, g->load_dest, g->fniv); + } else if (g->fni8) { + expand_2i_i64(dofs, aofs, done, c, g->load_dest, g->fni8); + } else { + done = 0; + } + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (g->fni4 && check_size_impl(oprsz, 4)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 4); + expand_2i_i32(dofs, aofs, done, c, g->load_dest, g->fni4); + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (oprsz == 0) { + if (maxsz != 0) { + expand_clr(dofs, maxsz); + } + return; + } + + do_ool: + tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, 0, g->fno); +} + /* Expand a vector three-operand operation. */ void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz, const GVecGen3 *g) @@ -1106,6 +1248,155 @@ void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, uint32_t aofs, tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g); } +void tcg_gen_vec_shl8i_i64(TCGv_i64 d, TCGv_i64 a, unsigned c) +{ + uint64_t mask = ((0xff << c) & 0xff) * (-1ull / 0xff); + tcg_gen_shli_i64(d, a, c); + tcg_gen_andi_i64(d, d, mask); +} + +void tcg_gen_vec_shl16i_i64(TCGv_i64 d, TCGv_i64 a, unsigned c) +{ + uint64_t mask = ((0xffff << c) & 0xffff) * (-1ull / 0xffff); + tcg_gen_shli_i64(d, a, c); + tcg_gen_andi_i64(d, d, mask); +} + +void tcg_gen_gvec_shli(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t clsz, unsigned shift) +{ + static const GVecGen2i g[4] = { + { .fni8 = tcg_gen_vec_shl8i_i64, + .fniv = tcg_gen_shli_vec, + .fno = gen_helper_gvec_shl8i, + .opc = INDEX_op_shli_vec, + .vece = MO_8 }, + { .fni8 = tcg_gen_vec_shl16i_i64, + .fniv = tcg_gen_shli_vec, + .fno = gen_helper_gvec_shl16i, + .opc = INDEX_op_shli_vec, + .vece = MO_16 }, + { .fni4 = tcg_gen_shli_i32, + .fniv = tcg_gen_shli_vec, + .fno = gen_helper_gvec_shl32i, + .opc = INDEX_op_shli_vec, + .vece = MO_32 }, + { .fni8 = tcg_gen_shli_i64, + .fniv = tcg_gen_shli_vec, + .fno = gen_helper_gvec_shl64i, + .opc = INDEX_op_shli_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_2i(dofs, aofs, opsz, clsz, shift, &g[vece]); +} + +void tcg_gen_vec_shr8i_i64(TCGv_i64 d, TCGv_i64 a, unsigned c) +{ + uint64_t mask = (0xff >> c) * (-1ull / 0xff); + tcg_gen_shri_i64(d, a, c); + tcg_gen_andi_i64(d, d, mask); +} + +void tcg_gen_vec_shr16i_i64(TCGv_i64 d, TCGv_i64 a, unsigned c) +{ + uint64_t mask = (0xffff >> c) * (-1ull / 0xffff); + tcg_gen_shri_i64(d, a, c); + tcg_gen_andi_i64(d, d, mask); +} + +void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t clsz, unsigned shift) +{ + static const GVecGen2i g[4] = { + { .fni8 = tcg_gen_vec_shr8i_i64, + .fniv = tcg_gen_shri_vec, + .fno = gen_helper_gvec_shr8i, + .opc = INDEX_op_shri_vec, + .vece = MO_8 }, + { .fni8 = tcg_gen_vec_shr16i_i64, + .fniv = tcg_gen_shri_vec, + .fno = gen_helper_gvec_shr16i, + .opc = INDEX_op_shri_vec, + .vece = MO_16 }, + { .fni4 = tcg_gen_shri_i32, + .fniv = tcg_gen_shri_vec, + .fno = gen_helper_gvec_shr32i, + .opc = INDEX_op_shri_vec, + .vece = MO_32 }, + { .fni8 = tcg_gen_shri_i64, + .fniv = tcg_gen_shri_vec, + .fno = gen_helper_gvec_shr64i, + .opc = INDEX_op_shri_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_2i(dofs, aofs, opsz, clsz, shift, &g[vece]); +} + +void tcg_gen_vec_sar8i_i64(TCGv_i64 d, TCGv_i64 a, unsigned c) +{ + uint64_t s_mask = (0x80 >> c) * (-1ull / 0xff); + uint64_t c_mask = (0xff >> c) * (-1ull / 0xff); + TCGv_i64 s = tcg_temp_new_i64(); + + tcg_gen_shri_i64(d, a, c); + tcg_gen_andi_i64(s, d, s_mask); /* isolate (shifted) sign bit */ + tcg_gen_muli_i64(s, s, (2 << c) - 2); /* replicate isolated signs */ + tcg_gen_andi_i64(d, d, c_mask); /* clear out bits above sign */ + tcg_gen_or_i64(d, d, s); /* include sign extension */ + tcg_temp_free_i64(s); +} + +void tcg_gen_vec_sar16i_i64(TCGv_i64 d, TCGv_i64 a, unsigned c) +{ + uint64_t s_mask = (0x8000 >> c) * (-1ull / 0xffff); + uint64_t c_mask = (0xffff >> c) * (-1ull / 0xffff); + TCGv_i64 s = tcg_temp_new_i64(); + + tcg_gen_shri_i64(d, a, c); + tcg_gen_andi_i64(s, d, s_mask); /* isolate (shifted) sign bit */ + tcg_gen_andi_i64(d, d, c_mask); /* clear out bits above sign */ + tcg_gen_muli_i64(s, s, (2 << c) - 2); /* replicate isolated signs */ + tcg_gen_or_i64(d, d, s); /* include sign extension */ + tcg_temp_free_i64(s); +} + +void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t clsz, unsigned shift) +{ + static const GVecGen2i g[4] = { + { .fni8 = tcg_gen_vec_sar8i_i64, + .fniv = tcg_gen_sari_vec, + .fno = gen_helper_gvec_sar8i, + .opc = INDEX_op_sari_vec, + .vece = MO_8 }, + { .fni8 = tcg_gen_vec_sar16i_i64, + .fniv = tcg_gen_sari_vec, + .fno = gen_helper_gvec_sar16i, + .opc = INDEX_op_sari_vec, + .vece = MO_16 }, + { .fni4 = tcg_gen_sari_i32, + .fniv = tcg_gen_sari_vec, + .fno = gen_helper_gvec_sar32i, + .opc = INDEX_op_sari_vec, + .vece = MO_32 }, + { .fni8 = tcg_gen_sari_i64, + .fniv = tcg_gen_sari_vec, + .fno = gen_helper_gvec_sar64i, + .opc = INDEX_op_sari_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_2i(dofs, aofs, opsz, clsz, shift, &g[vece]); +} + static void do_zip(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz, bool high) diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c index a5d0ff89c3..a441193b8e 100644 --- a/tcg/tcg-op-vec.c +++ b/tcg/tcg-op-vec.c @@ -381,6 +381,46 @@ void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a) } } +static void do_shifti(TCGOpcode opc, unsigned vece, + TCGv_vec r, TCGv_vec a, unsigned i) +{ + TCGTemp *rt = tcgv_vec_temp(r); + TCGTemp *at = tcgv_vec_temp(a); + TCGArg ri = temp_arg(rt); + TCGArg ai = temp_arg(at); + TCGType type = rt->base_type; + unsigned vecl = type - TCG_TYPE_V64; + int can; + + tcg_debug_assert(at->base_type == type); + tcg_debug_assert(i < (8 << vece)); + can = tcg_can_emit_vec_op(opc, type, vece); + if (can > 0) { + vec_gen_3(opc, type, vece, ri, ai, i); + } else { + /* We leave the choice of expansion via scalar or vector shift + to the target. Often, but not always, dupi can feed a vector + shift easier than a scalar. */ + tcg_debug_assert(can < 0); + tcg_expand_vec_op(opc, vecl, vece, ri, ai, i); + } +} + +void tcg_gen_shli_vec(unsigned vece, TCGv_vec r, TCGv_vec a, unsigned i) +{ + do_shifti(INDEX_op_shli_vec, vece, r, a, i); +} + +void tcg_gen_shri_vec(unsigned vece, TCGv_vec r, TCGv_vec a, unsigned i) +{ + do_shifti(INDEX_op_shri_vec, vece, r, a, i); +} + +void tcg_gen_sari_vec(unsigned vece, TCGv_vec r, TCGv_vec a, unsigned i) +{ + do_shifti(INDEX_op_sari_vec, vece, r, a, i); +} + static void do_interleave(TCGOpcode opc, unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) { diff --git a/tcg/tcg.c b/tcg/tcg.c index 33e3c03cbc..f82d6e80b0 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1403,6 +1403,18 @@ bool tcg_op_supported(TCGOpcode op) return have_vec && TCG_TARGET_HAS_andc_vec; case INDEX_op_orc_vec: return have_vec && TCG_TARGET_HAS_orc_vec; + case INDEX_op_shli_vec: + case INDEX_op_shri_vec: + case INDEX_op_sari_vec: + return have_vec && TCG_TARGET_HAS_shi_vec; + case INDEX_op_shls_vec: + case INDEX_op_shrs_vec: + case INDEX_op_sars_vec: + return have_vec && TCG_TARGET_HAS_shs_vec; + case INDEX_op_shlv_vec: + case INDEX_op_shrv_vec: + case INDEX_op_sarv_vec: + return have_vec && TCG_TARGET_HAS_shv_vec; case INDEX_op_zipl_vec: case INDEX_op_ziph_vec: return have_vec && TCG_TARGET_HAS_zip_vec; diff --git a/tcg/README b/tcg/README index 8ab8d3ab7e..75db47922d 100644 --- a/tcg/README +++ b/tcg/README @@ -561,6 +561,35 @@ E.g. VECL=1 -> 64 << 1 -> v128, and VECE=2 -> 1 << 2 -> i32. Similarly, logical operations with and without compliment. Note that VECE is unused. +* shli_vec v0, v1, i2 +* shls_vec v0, v1, s2 + + Shift all elements from v1 by a scalar i2/s2. I.e. + + for (i = 0; i < VECL/VECE; ++i) { + v0[i] = v1[i] << s2; + } + +* shri_vec v0, v1, i2 +* sari_vec v0, v1, i2 +* shrs_vec v0, v1, s2 +* sars_vec v0, v1, s2 + + Similarly for logical and arithmetic right shift. + +* shlv_vec v0, v1, v2 + + Shift elements from v1 by elements from v2. I.e. + + for (i = 0; i < VECL/VECE; ++i) { + v0[i] = v1[i] << v2[i]; + } + +* shrv_vec v0, v1, v2 +* sarv_vec v0, v1, v2 + + Similarly for logical and arithmetic right shift. + * zipl_vec v0, v1, v2 * ziph_vec v0, v1, v2 From patchwork Mon Dec 18 17:17:45 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122252 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3109061qgn; Mon, 18 Dec 2017 09:27:58 -0800 (PST) X-Google-Smtp-Source: ACJfBovzwM1n16ANLH99pcyFsAO76T/+wbBBt+4ddFRg85Vqoq1vgj17k+3lgr1p81ZtDmo12aQB X-Received: by 10.129.99.4 with SMTP id x4mr380827ywb.225.1513618078570; Mon, 18 Dec 2017 09:27:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513618078; cv=none; d=google.com; s=arc-20160816; b=Q2bwhBQNbKMdToFl3IHGEjOBq8rOKajU/xAzk58E7olzQhZ1lUFZHQH01WWc2vdgZG qeVVXuVslqlQSURnJttQHKRQXfSjJ7UpZVgwPOFOFCwxr2fjYVX4pYVrSi+p+nZ3x9E1 4xSkWC+yAwyPXMFoL3MAWvPu1i7YHVvDM2XT6n8v8Wwe4inKnw6aR1KSR2rTW3dcVYWp rcIwTfp1nVsq9NrujqEgVTH0NWRFabx1plWn6btenfado6iSiwvK6MQLXM6twL83EDez dRbAnkManPXew9EIVlRHFoOuWgFGIAPwGvGEeZ3xwPlfYHfvGD/oKOyYbWsnzqHmQ75s Zhlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=laXKmgFuu0JyfyjVocpo9O502Buy5yno3N0zMiOVxgg=; b=QgZuBX5FWa08oPCdjYCbvRFmmqS4wSneibH0fN2X8UNkKFvoqxgnpodJ5QfDxDVx3J vz+EtkRjHafP0l6Hdb1EH6ZX9duovhMAy6EGEl7GC8VdED8n/JN9BhN9/lVN9wfU6Xw/ //D4sYO//ddYFufRRULOG81vjF3QK5kyges16LGfpzSFr+2VhV1CPrz49ArHvmpLhBXN 9uOjjBYLcqgc6Mi1tt2INW8TLH3S0zgJAIWTud2V48xM8PpwEFFFzIFRgYUJKUQtsmwo N6WVRhK8oB8wb3pw9xFeun+o7kWQ9qft7DYiyGcAb593lqSJxkkapwYMK/mc/Y2HY2ex AKBg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=XI1Q0p/o; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id y132si2495971ywy.134.2017.12.18.09.27.58 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:27:58 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=XI1Q0p/o; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58582 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzCo-0008Tx-2S for patch@linaro.org; Mon, 18 Dec 2017 12:27:58 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37274) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3d-0000X0-6b for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:35 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3W-0001AI-Vg for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:29 -0500 Received: from mail-pf0-x243.google.com ([2607:f8b0:400e:c00::243]:34670) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3W-00019G-Ic for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:22 -0500 Received: by mail-pf0-x243.google.com with SMTP id a90so9948070pfk.1 for ; Mon, 18 Dec 2017 09:18:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=laXKmgFuu0JyfyjVocpo9O502Buy5yno3N0zMiOVxgg=; b=XI1Q0p/odceDGkWHfwRE3TMid+llK8TtcCVEb761xznZiD0w6MhtXlfuyR1giIXkwH CC+o3TQxxdoNw49sbsC2vMmUOECaS1O7aXsgJBjd5Yhhk1GrPz4wHqTBo4xYN/iZO/Dg 6TNZzFVzeP8e1HjQvrZYDAnpmMbRtFPygBNto= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=laXKmgFuu0JyfyjVocpo9O502Buy5yno3N0zMiOVxgg=; b=l0MhLGJaMShEjoy+iXjX5aRqG4qs79yS/XeaIFUBBoACboq99us1f0VzJ2qNtLnY/Q CuPgY7Zc5kXaNFLlQjJieWEdMjy78LECFfF02duX8fEp8FpGzJDYW6CGOwd+AYjdZDdV Abj5ggFRRdFLCZNVbIuL6BbPCxZb6TbrLU9EM1AiPWPZVe2VCaGQN2xBL98qgbF0nMNV mhdMZ5suRBXawNBFWY7HJVkn8HyPfVS1lCqWc6DTTC9FH5g2v+DK0wVUkJL0v5cWrS1A O3TROMrODiQF1EkHAi81GRzKPI6Agu3edAfocooS2wT9zJuObnv4qdSnSHmPqeeOMNWi ld/Q== X-Gm-Message-State: AKGB3mI9wqna2zwS6r3SBLVDnjs1ab2rFGROr8rxtOldM5zoYZzcOg0C BcSc33nFhi7Cu8Ee8tOqVqn7dy0Nu1E= X-Received: by 10.98.70.132 with SMTP id o4mr426242pfi.102.1513617501149; Mon, 18 Dec 2017 09:18:21 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.19 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:20 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:45 -0800 Message-Id: <20171218171758.16964-14-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::243 Subject: [Qemu-devel] [PATCH v7 13/26] target/arm: Use vector infrastructure for aa64 constant shifts X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- target/arm/translate-a64.c | 386 ++++++++++++++++++++++++++++++++++++++------- tcg/tcg-op-gvec.c | 18 ++- tcg/tcg-op-vec.c | 9 +- 3 files changed, 351 insertions(+), 62 deletions(-) -- 2.14.3 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 8769b4505a..c47faa5633 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -6432,17 +6432,6 @@ static void handle_shri_with_rndacc(TCGv_i64 tcg_res, TCGv_i64 tcg_src, } } -/* Common SHL/SLI - Shift left with an optional insert */ -static void handle_shli_with_ins(TCGv_i64 tcg_res, TCGv_i64 tcg_src, - bool insert, int shift) -{ - if (insert) { /* SLI */ - tcg_gen_deposit_i64(tcg_res, tcg_res, tcg_src, shift, 64 - shift); - } else { /* SHL */ - tcg_gen_shli_i64(tcg_res, tcg_src, shift); - } -} - /* SRI: shift right with insert */ static void handle_shri_with_ins(TCGv_i64 tcg_res, TCGv_i64 tcg_src, int size, int shift) @@ -6546,7 +6535,11 @@ static void handle_scalar_simd_shli(DisasContext *s, bool insert, tcg_rn = read_fp_dreg(s, rn); tcg_rd = insert ? read_fp_dreg(s, rd) : tcg_temp_new_i64(); - handle_shli_with_ins(tcg_rd, tcg_rn, insert, shift); + if (insert) { + tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_rn, shift, 64 - shift); + } else { + tcg_gen_shli_i64(tcg_rd, tcg_rn, shift); + } write_fp_dreg(s, rd, tcg_rd); @@ -8283,16 +8276,195 @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) } } +static void gen_ssra8_i64(TCGv_i64 d, TCGv_i64 a, unsigned shift) +{ + tcg_gen_vec_sar8i_i64(a, a, shift); + tcg_gen_vec_add8_i64(d, d, a); +} + +static void gen_ssra16_i64(TCGv_i64 d, TCGv_i64 a, unsigned shift) +{ + tcg_gen_vec_sar16i_i64(a, a, shift); + tcg_gen_vec_add16_i64(d, d, a); +} + +static void gen_ssra32_i32(TCGv_i32 d, TCGv_i32 a, unsigned shift) +{ + tcg_gen_sari_i32(a, a, shift); + tcg_gen_add_i32(d, d, a); +} + +static void gen_ssra64_i64(TCGv_i64 d, TCGv_i64 a, unsigned shift) +{ + tcg_gen_sari_i64(a, a, shift); + tcg_gen_add_i64(d, d, a); +} + +static void gen_ssra_vec(unsigned vece, TCGv_vec d, TCGv_vec a, unsigned sh) +{ + tcg_gen_sari_vec(vece, a, a, sh); + tcg_gen_add_vec(vece, d, d, a); +} + +static void gen_usra8_i64(TCGv_i64 d, TCGv_i64 a, unsigned shift) +{ + tcg_gen_vec_shr8i_i64(a, a, shift); + tcg_gen_vec_add8_i64(d, d, a); +} + +static void gen_usra16_i64(TCGv_i64 d, TCGv_i64 a, unsigned shift) +{ + tcg_gen_vec_shr16i_i64(a, a, shift); + tcg_gen_vec_add16_i64(d, d, a); +} + +static void gen_usra32_i32(TCGv_i32 d, TCGv_i32 a, unsigned shift) +{ + tcg_gen_shri_i32(a, a, shift); + tcg_gen_add_i32(d, d, a); +} + +static void gen_usra64_i64(TCGv_i64 d, TCGv_i64 a, unsigned shift) +{ + tcg_gen_shri_i64(a, a, shift); + tcg_gen_add_i64(d, d, a); +} + +static void gen_usra_vec(unsigned vece, TCGv_vec d, TCGv_vec a, unsigned sh) +{ + tcg_gen_shri_vec(vece, a, a, sh); + tcg_gen_add_vec(vece, d, d, a); +} + +static void gen_shr8_ins_i64(TCGv_i64 d, TCGv_i64 a, unsigned shift) +{ + uint64_t mask = (0xff >> shift) * (-1ull / 0xff); + TCGv_i64 t = tcg_temp_new_i64(); + + tcg_gen_shri_i64(t, a, shift); + tcg_gen_andi_i64(t, t, mask); + tcg_gen_andi_i64(d, d, ~mask); + tcg_gen_or_i64(d, d, t); + tcg_temp_free_i64(t); +} + +static void gen_shr16_ins_i64(TCGv_i64 d, TCGv_i64 a, unsigned shift) +{ + uint64_t mask = (0xffff >> shift) * (-1ull / 0xffff); + TCGv_i64 t = tcg_temp_new_i64(); + + tcg_gen_shri_i64(t, a, shift); + tcg_gen_andi_i64(t, t, mask); + tcg_gen_andi_i64(d, d, ~mask); + tcg_gen_or_i64(d, d, t); + tcg_temp_free_i64(t); +} + +static void gen_shr32_ins_i32(TCGv_i32 d, TCGv_i32 a, unsigned shift) +{ + tcg_gen_shri_i32(a, a, shift); + tcg_gen_deposit_i32(d, d, a, 0, 32 - shift); +} + +static void gen_shr64_ins_i64(TCGv_i64 d, TCGv_i64 a, unsigned shift) +{ + tcg_gen_shri_i64(a, a, shift); + tcg_gen_deposit_i64(d, d, a, 0, 64 - shift); +} + +static void gen_shr_ins_vec(unsigned vece, TCGv_vec d, TCGv_vec a, unsigned sh) +{ + uint64_t mask = (2ull << ((8 << vece) - 1)) - 1; + TCGv_vec t = tcg_temp_new_vec_matching(d); + TCGv_vec m = tcg_temp_new_vec_matching(d); + + tcg_gen_dupi_vec(vece, m, mask ^ (mask >> sh)); + tcg_gen_shri_vec(vece, t, a, sh); + tcg_gen_and_vec(vece, d, d, m); + tcg_gen_or_vec(vece, d, d, t); + + tcg_temp_free_vec(t); + tcg_temp_free_vec(m); +} + /* SSHR[RA]/USHR[RA] - Vector shift right (optional rounding/accumulate) */ static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u, int immh, int immb, int opcode, int rn, int rd) { + static const GVecGen2i ssra_op[4] = { + { .fni8 = gen_ssra8_i64, + .fniv = gen_ssra_vec, + .load_dest = true, + .opc = INDEX_op_sari_vec, + .vece = MO_8 }, + { .fni8 = gen_ssra16_i64, + .fniv = gen_ssra_vec, + .load_dest = true, + .opc = INDEX_op_sari_vec, + .vece = MO_16 }, + { .fni4 = gen_ssra32_i32, + .fniv = gen_ssra_vec, + .load_dest = true, + .opc = INDEX_op_sari_vec, + .vece = MO_32 }, + { .fni8 = gen_ssra64_i64, + .fniv = gen_ssra_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true, + .opc = INDEX_op_sari_vec, + .vece = MO_64 }, + }; + static const GVecGen2i usra_op[4] = { + { .fni8 = gen_usra8_i64, + .fniv = gen_usra_vec, + .load_dest = true, + .opc = INDEX_op_shri_vec, + .vece = MO_8, }, + { .fni8 = gen_usra16_i64, + .fniv = gen_usra_vec, + .load_dest = true, + .opc = INDEX_op_shri_vec, + .vece = MO_16, }, + { .fni4 = gen_usra32_i32, + .fniv = gen_usra_vec, + .load_dest = true, + .opc = INDEX_op_shri_vec, + .vece = MO_32, }, + { .fni8 = gen_usra64_i64, + .fniv = gen_usra_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true, + .opc = INDEX_op_shri_vec, + .vece = MO_64, }, + }; + static const GVecGen2i sri_op[4] = { + { .fni8 = gen_shr8_ins_i64, + .fniv = gen_shr_ins_vec, + .load_dest = true, + .opc = INDEX_op_shri_vec, + .vece = MO_8 }, + { .fni8 = gen_shr16_ins_i64, + .fniv = gen_shr_ins_vec, + .load_dest = true, + .opc = INDEX_op_shri_vec, + .vece = MO_16 }, + { .fni4 = gen_shr32_ins_i32, + .fniv = gen_shr_ins_vec, + .load_dest = true, + .opc = INDEX_op_shri_vec, + .vece = MO_32 }, + { .fni8 = gen_shr64_ins_i64, + .fniv = gen_shr_ins_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true, + .opc = INDEX_op_shri_vec, + .vece = MO_64 }, + }; + int size = 32 - clz32(immh) - 1; int immhb = immh << 3 | immb; int shift = 2 * (8 << size) - immhb; bool accumulate = false; - bool round = false; - bool insert = false; int dsize = is_q ? 128 : 64; int esize = 8 << size; int elements = dsize/esize; @@ -8300,6 +8472,8 @@ static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u, TCGv_i64 tcg_rn = new_tmp_a64(s); TCGv_i64 tcg_rd = new_tmp_a64(s); TCGv_i64 tcg_round; + uint64_t round_const; + const GVecGen2i *gvec_op; int i; if (extract32(immh, 3, 1) && !is_q) { @@ -8318,64 +8492,141 @@ static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u, switch (opcode) { case 0x02: /* SSRA / USRA (accumulate) */ - accumulate = true; - break; + if (is_u) { + /* Shift count same as element size produces zero to add. */ + if (shift == 8 << size) { + goto done; + } + gvec_op = &usra_op[size]; + } else { + /* Shift count same as element size produces all sign to add. */ + if (shift == 8 << size) { + shift -= 1; + } + gvec_op = &ssra_op[size]; + } + goto do_gvec; + case 0x08: /* SRI */ + /* Shift count same as element size is valid but does nothing. */ + if (shift == 8 << size) { + goto done; + } + gvec_op = &sri_op[size]; + do_gvec: + tcg_gen_gvec_2i(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), is_q ? 16 : 8, + vec_full_reg_size(s), shift, gvec_op); + return; + + case 0x00: /* SSHR / USHR */ + if (is_u) { + if (shift == 8 << size) { + /* Shift count the same size as element size produces zero. */ + tcg_gen_gvec_dup8i(vec_full_reg_offset(s, rd), + is_q ? 16 : 8, vec_full_reg_size(s), 0); + } else { + tcg_gen_gvec_shri(size, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), is_q ? 16 : 8, + vec_full_reg_size(s), shift); + } + } else { + /* Shift count the same size as element size produces all sign. */ + if (shift == 8 << size) { + shift -= 1; + } + tcg_gen_gvec_sari(size, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), is_q ? 16 : 8, + vec_full_reg_size(s), shift); + } + return; + case 0x04: /* SRSHR / URSHR (rounding) */ - round = true; break; case 0x06: /* SRSRA / URSRA (accum + rounding) */ - accumulate = round = true; - break; - case 0x08: /* SRI */ - insert = true; + accumulate = true; break; + default: + g_assert_not_reached(); } - if (round) { - uint64_t round_const = 1ULL << (shift - 1); - tcg_round = tcg_const_i64(round_const); - } else { - tcg_round = NULL; - } + round_const = 1ULL << (shift - 1); + tcg_round = tcg_const_i64(round_const); for (i = 0; i < elements; i++) { read_vec_element(s, tcg_rn, rn, i, memop); - if (accumulate || insert) { + if (accumulate) { read_vec_element(s, tcg_rd, rd, i, memop); } - if (insert) { - handle_shri_with_ins(tcg_rd, tcg_rn, size, shift); - } else { - handle_shri_with_rndacc(tcg_rd, tcg_rn, tcg_round, - accumulate, is_u, size, shift); - } + handle_shri_with_rndacc(tcg_rd, tcg_rn, tcg_round, + accumulate, is_u, size, shift); write_vec_element(s, tcg_rd, rd, i, size); } + tcg_temp_free_i64(tcg_round); + done: if (!is_q) { clear_vec_high(s, rd); } +} - if (round) { - tcg_temp_free_i64(tcg_round); - } +static void gen_shl8_ins_i64(TCGv_i64 d, TCGv_i64 a, unsigned shift) +{ + uint64_t mask = ((0xff << shift) & 0xff) * (-1ull / 0xff); + TCGv_i64 t = tcg_temp_new_i64(); + + tcg_gen_shli_i64(t, a, shift); + tcg_gen_andi_i64(t, t, mask); + tcg_gen_andi_i64(d, d, ~mask); + tcg_gen_or_i64(d, d, t); + tcg_temp_free_i64(t); +} + +static void gen_shl16_ins_i64(TCGv_i64 d, TCGv_i64 a, unsigned shift) +{ + uint64_t mask = ((0xffff << shift) & 0xffff) * (-1ull / 0xffff); + TCGv_i64 t = tcg_temp_new_i64(); + + tcg_gen_shli_i64(t, a, shift); + tcg_gen_andi_i64(t, t, mask); + tcg_gen_andi_i64(d, d, ~mask); + tcg_gen_or_i64(d, d, t); + tcg_temp_free_i64(t); +} + +static void gen_shl32_ins_i32(TCGv_i32 d, TCGv_i32 a, unsigned shift) +{ + tcg_gen_deposit_i32(d, d, a, shift, 32 - shift); +} + +static void gen_shl64_ins_i64(TCGv_i64 d, TCGv_i64 a, unsigned shift) +{ + tcg_gen_deposit_i64(d, d, a, shift, 64 - shift); +} + +static void gen_shl_ins_vec(unsigned vece, TCGv_vec d, TCGv_vec a, unsigned sh) +{ + uint64_t mask = (1ull << sh) - 1; + TCGv_vec t = tcg_temp_new_vec_matching(d); + TCGv_vec m = tcg_temp_new_vec_matching(d); + + tcg_gen_dupi_vec(vece, m, mask); + tcg_gen_shli_vec(vece, t, a, sh); + tcg_gen_and_vec(vece, d, d, m); + tcg_gen_or_vec(vece, d, d, t); + + tcg_temp_free_vec(t); + tcg_temp_free_vec(m); } /* SHL/SLI - Vector shift left */ static void handle_vec_simd_shli(DisasContext *s, bool is_q, bool insert, - int immh, int immb, int opcode, int rn, int rd) + int immh, int immb, int opcode, int rn, int rd) { int size = 32 - clz32(immh) - 1; int immhb = immh << 3 | immb; int shift = immhb - (8 << size); - int dsize = is_q ? 128 : 64; - int esize = 8 << size; - int elements = dsize/esize; - TCGv_i64 tcg_rn = new_tmp_a64(s); - TCGv_i64 tcg_rd = new_tmp_a64(s); - int i; if (extract32(immh, 3, 1) && !is_q) { unallocated_encoding(s); @@ -8391,19 +8642,40 @@ static void handle_vec_simd_shli(DisasContext *s, bool is_q, bool insert, return; } - for (i = 0; i < elements; i++) { - read_vec_element(s, tcg_rn, rn, i, size); - if (insert) { - read_vec_element(s, tcg_rd, rd, i, size); - } - - handle_shli_with_ins(tcg_rd, tcg_rn, insert, shift); - - write_vec_element(s, tcg_rd, rd, i, size); - } - - if (!is_q) { - clear_vec_high(s, rd); + if (insert) { + static const GVecGen2i shi_op[4] = { + { .fni8 = gen_shl8_ins_i64, + .fniv = gen_shl_ins_vec, + .opc = INDEX_op_shli_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true, + .vece = MO_8 }, + { .fni8 = gen_shl16_ins_i64, + .fniv = gen_shl_ins_vec, + .opc = INDEX_op_shli_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true, + .vece = MO_16 }, + { .fni4 = gen_shl32_ins_i32, + .fniv = gen_shl_ins_vec, + .opc = INDEX_op_shli_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true, + .vece = MO_32 }, + { .fni8 = gen_shl64_ins_i64, + .fniv = gen_shl_ins_vec, + .opc = INDEX_op_shli_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true, + .vece = MO_64 }, + }; + tcg_gen_gvec_2i(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), is_q ? 16 : 8, + vec_full_reg_size(s), shift, &shi_op[size]); + } else { + tcg_gen_gvec_shli(size, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), is_q ? 16 : 8, + vec_full_reg_size(s), shift); } } diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index efdb009d2a..b29a30a78b 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -1290,7 +1290,11 @@ void tcg_gen_gvec_shli(unsigned vece, uint32_t dofs, uint32_t aofs, }; tcg_debug_assert(vece <= MO_64); - tcg_gen_gvec_2i(dofs, aofs, opsz, clsz, shift, &g[vece]); + if (shift == 0) { + tcg_gen_gvec_mov(vece, dofs, aofs, opsz, clsz); + } else { + tcg_gen_gvec_2i(dofs, aofs, opsz, clsz, shift, &g[vece]); + } } void tcg_gen_vec_shr8i_i64(TCGv_i64 d, TCGv_i64 a, unsigned c) @@ -1335,7 +1339,11 @@ void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, uint32_t aofs, }; tcg_debug_assert(vece <= MO_64); - tcg_gen_gvec_2i(dofs, aofs, opsz, clsz, shift, &g[vece]); + if (shift == 0) { + tcg_gen_gvec_mov(vece, dofs, aofs, opsz, clsz); + } else { + tcg_gen_gvec_2i(dofs, aofs, opsz, clsz, shift, &g[vece]); + } } void tcg_gen_vec_sar8i_i64(TCGv_i64 d, TCGv_i64 a, unsigned c) @@ -1394,7 +1402,11 @@ void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs, }; tcg_debug_assert(vece <= MO_64); - tcg_gen_gvec_2i(dofs, aofs, opsz, clsz, shift, &g[vece]); + if (shift == 0) { + tcg_gen_gvec_mov(vece, dofs, aofs, opsz, clsz); + } else { + tcg_gen_gvec_2i(dofs, aofs, opsz, clsz, shift, &g[vece]); + } } static void do_zip(unsigned vece, uint32_t dofs, uint32_t aofs, diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c index a441193b8e..502c5ba891 100644 --- a/tcg/tcg-op-vec.c +++ b/tcg/tcg-op-vec.c @@ -389,11 +389,16 @@ static void do_shifti(TCGOpcode opc, unsigned vece, TCGArg ri = temp_arg(rt); TCGArg ai = temp_arg(at); TCGType type = rt->base_type; - unsigned vecl = type - TCG_TYPE_V64; int can; tcg_debug_assert(at->base_type == type); tcg_debug_assert(i < (8 << vece)); + + if (i == 0) { + tcg_gen_mov_vec(r, a); + return; + } + can = tcg_can_emit_vec_op(opc, type, vece); if (can > 0) { vec_gen_3(opc, type, vece, ri, ai, i); @@ -402,7 +407,7 @@ static void do_shifti(TCGOpcode opc, unsigned vece, to the target. Often, but not always, dupi can feed a vector shift easier than a scalar. */ tcg_debug_assert(can < 0); - tcg_expand_vec_op(opc, vecl, vece, ri, ai, i); + tcg_expand_vec_op(opc, type, vece, ri, ai, i); } } From patchwork Mon Dec 18 17:17:46 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122258 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3115418qgn; Mon, 18 Dec 2017 09:33:58 -0800 (PST) X-Google-Smtp-Source: ACJfBotgh5GzoyIJX1w55GSNXhnja289MAMR3aanvQVn8DXBbBwL3pXlpsL2cOksIGwiTKnzlXMz X-Received: by 10.37.51.9 with SMTP id z9mr420235ybz.351.1513618438829; Mon, 18 Dec 2017 09:33:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513618438; cv=none; d=google.com; s=arc-20160816; b=yRagz6zxtt0jkSib9Su6+tzWtu5wMLZb4awYSvok+tzhKggeX+B+E2ZOS9V0JnMgBm Q+CH73pfeIFQA+7Qa371pX/+JQ5HagMhDT2eCWYDk/+8g94sJCveSefGZ3QcJVJ5zunO 7WLhfVtSa7dQJgCJKiNGMFII0Q/aFtq6nPcVpycyAvENeTEs3WZ8KHl8fyt7aNTobJ2Y fOb3wtcxPE6r1pLaGR124pVTTXIhCd88Z6tYMjpjmv07T9ZQJ/jlmXSdK75Nzytp1re1 snNknhcAc6bXMsg+kRPpmkkffnag30ju92T7jKguQW/XmjnmDfr56rbqiA8rNF++M74R e1UA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=iE5KNqbtr4ZP3IjheALkSbcp/BjtAZFJJ780Ws3T5HY=; b=YZoPexb/lq3vxDnyTpVVeuPkFzmRWbnBt82d60jYVGZJslQbIiYoFKnwGO4XHF51Yn sBwsvD1ltcsddYrnIvY1kGVv85YYiuhify/lKfJh4XdhYTgyGRpS0ulnVhZhDJDTSXZD PTOcoOPnw9zxVdYayiRjmhSgme/oeVpXb5AH3htBqD4uuxRsnjGxa6bWA+AbC/dYGUBj 2eQcnzqmbIr6AWhbve3xbvGy5GXQ7y04KLN9ceDgnt69s3V943IHJ1YDuWAgRcdR8AUb GeaLkAABz11bDgpFO1EafAJfIsgT/dbqIzIfAbEF3ySPOazTEyd6ELT6SfLYD34Jdhm5 MsiQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=SPaPKLPV; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id l85si2561305ybl.622.2017.12.18.09.33.58 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:33:58 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=SPaPKLPV; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58616 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzIc-00047l-7M for patch@linaro.org; Mon, 18 Dec 2017 12:33:58 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37280) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3d-0000X5-8k for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:35 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3Y-0001C7-HI for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:29 -0500 Received: from mail-pl0-x241.google.com ([2607:f8b0:400e:c01::241]:36439) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3Y-0001BA-6g for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:24 -0500 Received: by mail-pl0-x241.google.com with SMTP id b12so5186261plm.3 for ; Mon, 18 Dec 2017 09:18:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=iE5KNqbtr4ZP3IjheALkSbcp/BjtAZFJJ780Ws3T5HY=; b=SPaPKLPVjes8yDrzckQrT0FH0iDfMsCymY3I8jhHmebx+R4aCxjp4FATqlfi5c96nD w5zt+RILMKOy0oV4tDtb6xzkzoEquhyEayFJyZw2wPR5p2dr24kOmJGhMxKm60KOtsOf iZBsTtYFLTEHnqvf8oLsgVU3cRvI4e274nCkI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=iE5KNqbtr4ZP3IjheALkSbcp/BjtAZFJJ780Ws3T5HY=; b=TI0sDrt7ub9XdktWDVyp7u2ZaxRWAPSeReDHdHxAKdupuz9etmCumb8vHrvll2vusS VQ3HOx9rXD3et9UgVeZSEwfeiltgfClCCt4TF2dby8HotklYyW+iyGYl0I/fXuGuOW9y uZzd+SaAmgkaU6WZEGYgQSgbpyEa8BKtmA6CmLpdeUOqBIp6JepGhWASj1cALJyimUJK XTK16INIfHMg4Dq4nmj2z0VQ0D+zlwyxzT4ycBLfH6vD1/t3bGxPBkUCGCZS3rnsDvNR 3OgpLrFqKtED0LfTlivvsdID3myiufRI7JxA+4Fil435GdMtkCS/EsZIIDtNBtQzulao asXA== X-Gm-Message-State: AKGB3mK5IrL36YKUOeqOoQpHEzHJMzcYwjrSYYojbvEIN77uvG1SaZTT t7Fbvdlvre6vX9qEKPKgCFEQJaiBxGM= X-Received: by 10.159.242.196 with SMTP id x4mr412944plw.342.1513617502765; Mon, 18 Dec 2017 09:18:22 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.21 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:21 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:46 -0800 Message-Id: <20171218171758.16964-15-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::241 Subject: [Qemu-devel] [PATCH v7 14/26] tcg: Add generic vector ops for comparisons X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- accel/tcg/tcg-runtime.h | 30 +++++++ tcg/i386/tcg-target.h | 1 + tcg/tcg-op-gvec.h | 4 + tcg/tcg-op.h | 3 + tcg/tcg-opc.h | 2 + tcg/tcg.h | 1 + accel/tcg/tcg-runtime-gvec.c | 24 +++++ tcg/tcg-op-gvec.c | 202 +++++++++++++++++++++++++++++++++++++++++++ tcg/tcg-op-vec.c | 23 +++++ tcg/tcg.c | 2 + tcg/README | 4 + 11 files changed, 296 insertions(+) -- 2.14.3 diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h index cb05a755b8..28abf30d76 100644 --- a/accel/tcg/tcg-runtime.h +++ b/accel/tcg/tcg-runtime.h @@ -193,3 +193,33 @@ DEF_HELPER_FLAGS_4(gvec_trn8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_trn16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_trn32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_trn64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_eq8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_eq16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_eq32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_eq64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_ne8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ne16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ne32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ne64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_lt8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_lt16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_lt32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_lt64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_le8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_le16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_le32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_le64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_ltu8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ltu16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ltu32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ltu64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_leu8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_leu16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_leu32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_leu64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h index 92d533eb92..46c4dca7be 100644 --- a/tcg/i386/tcg-target.h +++ b/tcg/i386/tcg-target.h @@ -183,6 +183,7 @@ extern bool have_avx2; #define TCG_TARGET_HAS_zip_vec 0 #define TCG_TARGET_HAS_uzp_vec 0 #define TCG_TARGET_HAS_trn_vec 0 +#define TCG_TARGET_HAS_cmp_vec 0 #define TCG_TARGET_deposit_i32_valid(ofs, len) \ (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \ diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h index de2c0e669a..308bdc13b4 100644 --- a/tcg/tcg-op-gvec.h +++ b/tcg/tcg-op-gvec.h @@ -178,6 +178,10 @@ void tcg_gen_gvec_trne(unsigned vece, uint32_t dofs, uint32_t aofs, void tcg_gen_gvec_trno(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_cmp(TCGCond cond, unsigned vece, uint32_t dofs, + uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz); + /* * 64-bit vector operations. Use these when the register has been allocated * with tcg_global_mem_new_i64, and so we cannot also address it via pointer. diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index 83478ab006..b4f73c6048 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -939,6 +939,9 @@ void tcg_gen_uzpo_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_trne_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_trno_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_cmp_vec(TCGCond cond, unsigned vece, TCGv_vec r, + TCGv_vec a, TCGv_vec b); + void tcg_gen_ld_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset); void tcg_gen_st_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset); void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t); diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index a085fc077b..d3fa014507 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -248,6 +248,8 @@ DEF(uzpo_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_uzp_vec)) DEF(trne_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_trn_vec)) DEF(trno_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_trn_vec)) +DEF(cmp_vec, 1, 2, 1, IMPLVEC) + DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT) #if TCG_TARGET_MAYBE_vec diff --git a/tcg/tcg.h b/tcg/tcg.h index bdfe058c45..ceef8742b5 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -184,6 +184,7 @@ typedef uint64_t TCGRegSet; #define TCG_TARGET_HAS_zip_vec 0 #define TCG_TARGET_HAS_uzp_vec 0 #define TCG_TARGET_HAS_trn_vec 0 +#define TCG_TARGET_HAS_cmp_vec 0 #else #define TCG_TARGET_MAYBE_vec 1 #endif diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c index fba62f1192..e0cde3216f 100644 --- a/accel/tcg/tcg-runtime-gvec.c +++ b/accel/tcg/tcg-runtime-gvec.c @@ -520,3 +520,27 @@ DO_TRN(gvec_trn8, uint8_t) DO_TRN(gvec_trn16, uint16_t) DO_TRN(gvec_trn32, uint32_t) DO_TRN(gvec_trn64, uint64_t) + +#define DO_CMP1(NAME, TYPE, OP) \ +void HELPER(NAME)(void *d, void *a, void *b, uint32_t desc) \ +{ \ + intptr_t oprsz = simd_oprsz(desc); \ + intptr_t i; \ + for (i = 0; i < oprsz; i += sizeof(vec64)) { \ + *(TYPE *)(d + i) = *(TYPE *)(a + i) OP *(TYPE *)(b + i); \ + } \ + clear_high(d, oprsz, desc); \ +} + +#define DO_CMP2(SZ) \ + DO_CMP1(gvec_eq##SZ, vec##SZ, ==) \ + DO_CMP1(gvec_ne##SZ, vec##SZ, !=) \ + DO_CMP1(gvec_lt##SZ, svec##SZ, <) \ + DO_CMP1(gvec_le##SZ, svec##SZ, <=) \ + DO_CMP1(gvec_ltu##SZ, vec##SZ, <) \ + DO_CMP1(gvec_leu##SZ, vec##SZ, <=) + +DO_CMP2(8) +DO_CMP2(16) +DO_CMP2(32) +DO_CMP2(64) diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index b29a30a78b..0db48bc75a 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -1721,3 +1721,205 @@ void tcg_gen_gvec_trno(unsigned vece, uint32_t dofs, uint32_t aofs, tcg_debug_assert(vece <= MO_64); tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g[vece]); } + +/* Expand OPSZ bytes worth of three-operand operations using i32 elements. */ +static void expand_cmp_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, TCGCond cond) +{ + TCGv_i32 t0 = tcg_temp_new_i32(); + TCGv_i32 t1 = tcg_temp_new_i32(); + uint32_t i; + + for (i = 0; i < opsz; i += 4) { + tcg_gen_ld_i32(t0, cpu_env, aofs + i); + tcg_gen_ld_i32(t1, cpu_env, bofs + i); + tcg_gen_setcond_i32(cond, t0, t0, t1); + tcg_gen_neg_i32(t0, t0); + tcg_gen_st_i32(t0, cpu_env, dofs + i); + } + tcg_temp_free_i32(t1); + tcg_temp_free_i32(t0); +} + +static void expand_cmp_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, TCGCond cond) +{ + TCGv_i64 t0 = tcg_temp_new_i64(); + TCGv_i64 t1 = tcg_temp_new_i64(); + uint32_t i; + + for (i = 0; i < opsz; i += 8) { + tcg_gen_ld_i64(t0, cpu_env, aofs + i); + tcg_gen_ld_i64(t1, cpu_env, bofs + i); + tcg_gen_setcond_i64(cond, t0, t0, t1); + tcg_gen_neg_i64(t0, t0); + tcg_gen_st_i64(t0, cpu_env, dofs + i); + } + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t0); +} + +static void expand_cmp_vec(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t tysz, + TCGType type, TCGCond cond) +{ + TCGv_vec t0 = tcg_temp_new_vec(type); + TCGv_vec t1 = tcg_temp_new_vec(type); + uint32_t i; + + for (i = 0; i < opsz; i += tysz) { + tcg_gen_ld_vec(t0, cpu_env, aofs + i); + tcg_gen_ld_vec(t1, cpu_env, bofs + i); + tcg_gen_cmp_vec(cond, vece, t0, t0, t1); + tcg_gen_st_vec(t0, cpu_env, dofs + i); + } + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t0); +} + +void tcg_gen_gvec_cmp(TCGCond cond, unsigned vece, uint32_t dofs, + uint32_t aofs, uint32_t bofs, + uint32_t oprsz, uint32_t maxsz) +{ + static gen_helper_gvec_3 * const eq_fn[4] = { + gen_helper_gvec_eq8, gen_helper_gvec_eq16, + gen_helper_gvec_eq32, gen_helper_gvec_eq64 + }; + static gen_helper_gvec_3 * const ne_fn[4] = { + gen_helper_gvec_ne8, gen_helper_gvec_ne16, + gen_helper_gvec_ne32, gen_helper_gvec_ne64 + }; + static gen_helper_gvec_3 * const lt_fn[4] = { + gen_helper_gvec_lt8, gen_helper_gvec_lt16, + gen_helper_gvec_lt32, gen_helper_gvec_lt64 + }; + static gen_helper_gvec_3 * const le_fn[4] = { + gen_helper_gvec_le8, gen_helper_gvec_le16, + gen_helper_gvec_le32, gen_helper_gvec_le64 + }; + static gen_helper_gvec_3 * const ltu_fn[4] = { + gen_helper_gvec_ltu8, gen_helper_gvec_ltu16, + gen_helper_gvec_ltu32, gen_helper_gvec_ltu64 + }; + static gen_helper_gvec_3 * const leu_fn[4] = { + gen_helper_gvec_leu8, gen_helper_gvec_leu16, + gen_helper_gvec_leu32, gen_helper_gvec_leu64 + }; + gen_helper_gvec_3 *fn; + uint32_t tmp; + + check_size_align(oprsz, maxsz, dofs | aofs | bofs); + check_overlap_3(dofs, aofs, bofs, maxsz); + + if (cond == TCG_COND_NEVER || cond == TCG_COND_ALWAYS) { + tcg_gen_gvec_dup32i(dofs, oprsz, maxsz, -(cond == TCG_COND_ALWAYS)); + return; + } + + /* Quick check for sizes we won't support inline. */ + if (oprsz > MAX_UNROLL * 32 || maxsz > MAX_UNROLL * 32) { + goto do_ool; + } + + /* Recall that ARM SVE allows vector sizes that are not a power of 2. + Expand with successively smaller host vector sizes. The intent is + that e.g. oprsz == 80 would be expanded with 2x32 + 1x16. */ + /* ??? For maxsz > oprsz, the host may be able to use an op-sized + operation, zeroing the balance of the register. We can then + use a cl-sized store to implement the clearing without an extra + store operation. This is true for aarch64 and x86_64 hosts. */ + + if (TCG_TARGET_HAS_v256 && check_size_impl(oprsz, 32) + && tcg_can_emit_vec_op(INDEX_op_cmp_vec, TCG_TYPE_V256, vece)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 32); + expand_cmp_vec(vece, dofs, aofs, bofs, done, 32, TCG_TYPE_V256, cond); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (TCG_TARGET_HAS_v128 && check_size_impl(oprsz, 16) + && tcg_can_emit_vec_op(INDEX_op_cmp_vec, TCG_TYPE_V128, vece)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 16); + expand_cmp_vec(vece, dofs, aofs, bofs, done, 16, TCG_TYPE_V128, cond); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (check_size_impl(oprsz, 8)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 8); + if (TCG_TARGET_HAS_v64 + && (TCG_TARGET_REG_BITS == 32 || vece != MO_64) + && tcg_can_emit_vec_op(INDEX_op_cmp_vec, TCG_TYPE_V64, vece)) { + expand_cmp_vec(vece, dofs, aofs, bofs, done, 8, TCG_TYPE_V64, cond); + } else if (vece == MO_64) { + expand_cmp_i64(dofs, aofs, bofs, done, cond); + } else { + done = 0; + } + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (vece == MO_32 && check_size_impl(oprsz, 4)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 4); + expand_cmp_i32(dofs, aofs, bofs, done, cond); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (oprsz == 0) { + if (maxsz != 0) { + expand_clr(dofs, maxsz); + } + return; + } + + do_ool: + switch (cond) { + case TCG_COND_EQ: + fn = eq_fn[vece]; + break; + case TCG_COND_NE: + fn = ne_fn[vece]; + break; + case TCG_COND_GT: + tmp = aofs, aofs = bofs, bofs = tmp; + /* fallthru */ + case TCG_COND_LT: + fn = lt_fn[vece]; + break; + case TCG_COND_GE: + tmp = aofs, aofs = bofs, bofs = tmp; + /* fallthru */ + case TCG_COND_LE: + fn = le_fn[vece]; + break; + case TCG_COND_GTU: + tmp = aofs, aofs = bofs, bofs = tmp; + /* fallthru */ + case TCG_COND_LTU: + fn = ltu_fn[vece]; + break; + case TCG_COND_GEU: + tmp = aofs, aofs = bofs, bofs = tmp; + /* fallthru */ + case TCG_COND_LEU: + fn = leu_fn[vece]; + break; + default: + g_assert_not_reached(); + } + tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, 0, fn); +} diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c index 502c5ba891..2c636ebbd6 100644 --- a/tcg/tcg-op-vec.c +++ b/tcg/tcg-op-vec.c @@ -480,3 +480,26 @@ void tcg_gen_trno_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) { do_interleave(INDEX_op_trno_vec, vece, r, a, b); } + +void tcg_gen_cmp_vec(TCGCond cond, unsigned vece, + TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + TCGTemp *rt = tcgv_vec_temp(r); + TCGTemp *at = tcgv_vec_temp(a); + TCGTemp *bt = tcgv_vec_temp(b); + TCGArg ri = temp_arg(rt); + TCGArg ai = temp_arg(at); + TCGArg bi = temp_arg(bt); + TCGType type = rt->base_type; + int can; + + tcg_debug_assert(at->base_type == type); + tcg_debug_assert(bt->base_type == type); + can = tcg_can_emit_vec_op(INDEX_op_cmp_vec, type, vece); + if (can > 0) { + vec_gen_4(INDEX_op_cmp_vec, type, vece, ri, ai, bi, cond); + } else { + tcg_debug_assert(can < 0); + tcg_expand_vec_op(INDEX_op_cmp_vec, type, vece, ri, ai, bi, cond); + } +} diff --git a/tcg/tcg.c b/tcg/tcg.c index f82d6e80b0..a85547a6d2 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1392,6 +1392,7 @@ bool tcg_op_supported(TCGOpcode op) case INDEX_op_and_vec: case INDEX_op_or_vec: case INDEX_op_xor_vec: + case INDEX_op_cmp_vec: return have_vec; case INDEX_op_dup2_vec: return have_vec && TCG_TARGET_REG_BITS == 32; @@ -1791,6 +1792,7 @@ void tcg_dump_ops(TCGContext *s) case INDEX_op_brcond_i64: case INDEX_op_setcond_i64: case INDEX_op_movcond_i64: + case INDEX_op_cmp_vec: if (op->args[k] < ARRAY_SIZE(cond_name) && cond_name[op->args[k]]) { col += qemu_log(",%s", cond_name[op->args[k++]]); diff --git a/tcg/README b/tcg/README index 75db47922d..18b6bbd8f1 100644 --- a/tcg/README +++ b/tcg/README @@ -630,6 +630,10 @@ E.g. VECL=1 -> 64 << 1 -> v128, and VECE=2 -> 1 << 2 -> i32. v0[2i + 1] = v2[2i + part]; } +* cmp_vec v0, v1, v2, cond + + Compare vectors by element, storing -1 for true and 0 for false. + ********* Note 1: Some shortcuts are defined when the last operand is known to be From patchwork Mon Dec 18 17:17:47 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122259 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3116115qgn; Mon, 18 Dec 2017 09:34:37 -0800 (PST) X-Google-Smtp-Source: ACJfBov4kXp4/k1J1BS3VbFiMlq6RcNosSGYCegEgbsPRaiE6ouuFyoTrqtKrs9tYl0TQukSsRFs X-Received: by 10.129.160.194 with SMTP id x185mr413377ywg.39.1513618477268; Mon, 18 Dec 2017 09:34:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513618477; cv=none; d=google.com; s=arc-20160816; b=dOPaqg6YR5fRha1JH6EYqqOMhx1+XYCJNOAdkwTsffwMcQjghPF0QzMQJwLtg98IlE l/t0qoGFh8/uiwJKT9zf3H5n5l3VQDVbYGFu/0SaykHD2vyYD2jTQLA1T+HX/dzXnyNF VIa1t0uPzsh+MEJaWFFY14R2VZ8HQ23GjtXRyvTTuilTeJo42YOnI9NT3QxZOCe5EIje qSAfENqxn0qkv+yyKWX9OauRZtNWENfZkfpaOmCT2LvEfAjeKAigqBEdFOWSWJr67qfH WkACRSS4tc+XyPmMIIsSbFyi/eu+5PhXzFaXIp5fxgKgRJ+QzYOxEZ6EuhK+oRowNwfq W/Mg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=cM2dlrybJfMe8tAwWH5AbSc/+9BwG8aOe2MOv/DJn6A=; b=x4IfdZEs7DRQfgN8krZ6l+1YuGx3jOxrgaALOSapDbrdqthUhHuhcRPiZKUSJ4FnNs /NSUHVdf9cCjmGZRz1lPFsqlf4lTdBQ8Xs3E517fSKHJlc2M7ZiZRXEZqLqiEfLlv/2E Y8ubX7l+SuJij82MKCiWl1u5qGBDi5qeilsu8ue3b3EJbd/PZlOW5ePoaLzeODNf1sFf 6Lel2RjTQeZna6++xNAfmKIgLBTsLdCFWdevacntEAWryMWkIpIytDc2xw7ogSzMsdRC POkD5ozdKclPwIzGqyWiTvnodnp9TddXh2LhrH85dDbMHfEMGrEbBziU9Ztb2IDvi/5Z rpxg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=QvI91Kgn; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id w134si2486796ywd.801.2017.12.18.09.34.37 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:34:37 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=QvI91Kgn; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58617 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzJE-0004g1-Lq for patch@linaro.org; Mon, 18 Dec 2017 12:34:36 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37303) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3d-0000Xd-Tn for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:35 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3Z-0001EI-Ri for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:29 -0500 Received: from mail-pl0-x244.google.com ([2607:f8b0:400e:c01::244]:40222) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3Z-0001DN-J9 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:25 -0500 Received: by mail-pl0-x244.google.com with SMTP id 62so3753503pld.7 for ; Mon, 18 Dec 2017 09:18:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=cM2dlrybJfMe8tAwWH5AbSc/+9BwG8aOe2MOv/DJn6A=; b=QvI91KgnIbaymMtJw+Qj3Dpt4BY2CW6KFHFF0kKtRticA33Wh7YHst1MnU+ag81YBp eWifADbaKuuAEwEzPREFXFhp67U6PwWTjZ9J0h4tzAOdfB9bDCh6/H9yWpeWeJXbfn0v i/NEXpqezfytY5VQoT4cj0RtXO6mFO24TloFs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=cM2dlrybJfMe8tAwWH5AbSc/+9BwG8aOe2MOv/DJn6A=; b=cHDKNAycdOXmust5qu565HwGnai6tkQ6PY4k2p+1opYY1+Ansl4hyCWr3Vq3BVSMGf S8OTuLhrPpCgCwrbP4echFyVc0XLDiE6E5ztbSC481YnarTOZcxrOjQTtfxCM15LTeJB h4mm1EcUQl/WhtKC8LJRk34MjmLyYrdTaEUwy+X1tNNg8z28lgbN/ejwPYly6AvWHc+y xTDMq5k1sdtdARV9Gj8OUIFi1djvEmhkcytC+dmA4W0foXk/2M3rpQhxnE13Deqffvnm QGuhVTD9ZYbzdBdFFg7enH0AO57X5+vEdRDwYo7+K9Hwfo1pGMEfw7Shc/yrzYbaNORO zT9Q== X-Gm-Message-State: AKGB3mI6yKJnrdrTsb/vjr0ZIdirJ8EMdGQb5Wv8B2gKZLR2z3kxAs8X sibHednXrgRl9UbaHW3AwrqZveDNNyg= X-Received: by 10.84.137.169 with SMTP id 38mr422714pln.36.1513617504291; Mon, 18 Dec 2017 09:18:24 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.22 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:23 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:47 -0800 Message-Id: <20171218171758.16964-16-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::244 Subject: [Qemu-devel] [PATCH v7 15/26] target/arm: Use vector infrastructure for aa64 compares X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- target/arm/translate-a64.c | 96 ++++++++++++++++++++++++++++++---------------- 1 file changed, 62 insertions(+), 34 deletions(-) -- 2.14.3 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index c47faa5633..1ea7e37b03 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -7115,6 +7115,28 @@ static void disas_simd_scalar_three_reg_diff(DisasContext *s, uint32_t insn) } } +/* CMTST : test is "if (X & Y != 0)". */ +static void gen_cmtst_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b) +{ + tcg_gen_and_i32(d, a, b); + tcg_gen_setcondi_i32(TCG_COND_NE, d, d, 0); + tcg_gen_neg_i32(d, d); +} + +static void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + tcg_gen_and_i64(d, a, b); + tcg_gen_setcondi_i64(TCG_COND_NE, d, d, 0); + tcg_gen_neg_i64(d, d); +} + +static void gen_cmtst_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_and_vec(vece, d, a, b); + tcg_gen_dupi_vec(vece, a, 0); + tcg_gen_cmp_vec(TCG_COND_NE, vece, d, d, a); +} + static void handle_3same_64(DisasContext *s, int opcode, bool u, TCGv_i64 tcg_rd, TCGv_i64 tcg_rn, TCGv_i64 tcg_rm) { @@ -7158,10 +7180,7 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u, cond = TCG_COND_EQ; goto do_cmop; } - /* CMTST : test is "if (X & Y != 0)". */ - tcg_gen_and_i64(tcg_rd, tcg_rn, tcg_rm); - tcg_gen_setcondi_i64(TCG_COND_NE, tcg_rd, tcg_rd, 0); - tcg_gen_neg_i64(tcg_rd, tcg_rd); + gen_cmtst_i64(tcg_rd, tcg_rn, tcg_rm); break; case 0x8: /* SSHL, USHL */ if (u) { @@ -9684,6 +9703,7 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) int rd = extract32(insn, 0, 5); int pass; GVecGen3Fn *gvec_op; + TCGCond cond; switch (opcode) { case 0x13: /* MUL, PMUL */ @@ -9731,6 +9751,44 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) vec_full_reg_offset(s, rm), is_q ? 16 : 8, vec_full_reg_size(s)); return; + case 0x11: + if (u) { /* CMEQ */ + cond = TCG_COND_EQ; + goto do_gvec_cmp; + } else { /* CMTST */ + static const GVecGen3 cmtst_op[4] = { + { .fni4 = gen_helper_neon_tst_u8, + .fniv = gen_cmtst_vec, + .vece = MO_8 }, + { .fni4 = gen_helper_neon_tst_u16, + .fniv = gen_cmtst_vec, + .vece = MO_16 }, + { .fni4 = gen_cmtst_i32, + .fniv = gen_cmtst_vec, + .vece = MO_32 }, + { .fni8 = gen_cmtst_i64, + .fniv = gen_cmtst_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 }, + }; + tcg_gen_gvec_3(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s), + &cmtst_op[size]); + } + return; + case 0x06: /* CMGT, CMHI */ + cond = u ? TCG_COND_GTU : TCG_COND_GT; + goto do_gvec_cmp; + case 0x07: /* CMGE, CMHS */ + cond = u ? TCG_COND_GEU : TCG_COND_GE; + do_gvec_cmp: + tcg_gen_gvec_cmp(cond, size, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s)); + return; } if (size == 3) { @@ -9813,26 +9871,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) genenvfn = fns[size][u]; break; } - case 0x6: /* CMGT, CMHI */ - { - static NeonGenTwoOpFn * const fns[3][2] = { - { gen_helper_neon_cgt_s8, gen_helper_neon_cgt_u8 }, - { gen_helper_neon_cgt_s16, gen_helper_neon_cgt_u16 }, - { gen_helper_neon_cgt_s32, gen_helper_neon_cgt_u32 }, - }; - genfn = fns[size][u]; - break; - } - case 0x7: /* CMGE, CMHS */ - { - static NeonGenTwoOpFn * const fns[3][2] = { - { gen_helper_neon_cge_s8, gen_helper_neon_cge_u8 }, - { gen_helper_neon_cge_s16, gen_helper_neon_cge_u16 }, - { gen_helper_neon_cge_s32, gen_helper_neon_cge_u32 }, - }; - genfn = fns[size][u]; - break; - } case 0x8: /* SSHL, USHL */ { static NeonGenTwoOpFn * const fns[3][2] = { @@ -9905,16 +9943,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) genfn = fns[size][u]; break; } - case 0x11: /* CMTST, CMEQ */ - { - static NeonGenTwoOpFn * const fns[3][2] = { - { gen_helper_neon_tst_u8, gen_helper_neon_ceq_u8 }, - { gen_helper_neon_tst_u16, gen_helper_neon_ceq_u16 }, - { gen_helper_neon_tst_u32, gen_helper_neon_ceq_u32 }, - }; - genfn = fns[size][u]; - break; - } case 0x13: /* MUL, PMUL */ if (u) { /* PMUL */ From patchwork Mon Dec 18 17:17:48 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122262 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3119798qgn; Mon, 18 Dec 2017 09:38:18 -0800 (PST) X-Google-Smtp-Source: ACJfBos5s3qTCRKm/O2KM/rZkXpGlzGPx+zosyuLSdtbesq9O9JRnjFsLRL/ebt7oKkN/8TkeLCv X-Received: by 10.37.57.73 with SMTP id g70mr427876yba.379.1513618698370; Mon, 18 Dec 2017 09:38:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513618698; cv=none; d=google.com; s=arc-20160816; b=tyN6xREL/BPhZBSK2BlrAjDv74GfAQWIAd3jiv3J53cRbEiysi7RoEDSkHn1sIdJVU /yPKxrII+EmuvICVWGZ/qUYlbt5Ryfn1u1J8wdfY18sgvjesRzOdfvQtGNzKsbmnh+A4 Vo9UAsh5APGF6OsAxovCB0FOrlunqjE0eRmeji5/dtni8LeA9wbM6ebXr5C5YMFdVjYJ Tzjgeq0o//D5bpbwdLs0bF9/53588e8BEaLPCQop1nZbGaQNZlyEev8H8vRS8aV4zoH1 Aq1i8/lZkQnp0chSEfnv7DSagHdNzTNNqLi7TpSXjV6sTIHR/XfL6ZEzrPyBzHs6JZ/Y J9rA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=eKafHTsW1NaB6I4H9HQVYk9nTj71t8ZHDremoxjtoLY=; b=YKoEZcLihWneIuO5j7YuiBdAO7ZTplFuwy8RZ3ZSW7z3d3lC8kwpOF8ZU0XspHqhIW uyQAY+6hk+ZGV4FhUrhtAcnzxwxMiWEzIK909w2nzqAV7QtTOY4S4R9mSaWeq2AqXY4M cKV8XUoynuCu6UIFfsjJm2HrgranHSVMqmEHIjPewKCHCpMD6/Xkn6frK5goWWDMfCNk /gzFAB4VPrgGO+S5D6fVAfUqFBLUiAYgHYMlj4UqYDM4EBT7TLZlKpETSw8ptL3t+g35 LvhExXIezc10M7wFd26/FjNOjuEfBXUId0RhYLLdaQSE7nzvUw3XKb9px66R4tUKBtD6 QLow== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=Sygq4RLf; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id c64si2395635ywf.784.2017.12.18.09.38.18 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:38:18 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=Sygq4RLf; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58774 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzMn-0007gx-Op for patch@linaro.org; Mon, 18 Dec 2017 12:38:17 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37453) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3j-0000c8-Ep for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:38 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3b-0001Gr-QP for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:35 -0500 Received: from mail-pl0-x244.google.com ([2607:f8b0:400e:c01::244]:38984) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3b-0001Fg-Ba for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:27 -0500 Received: by mail-pl0-x244.google.com with SMTP id bi12so5200667plb.6 for ; Mon, 18 Dec 2017 09:18:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=eKafHTsW1NaB6I4H9HQVYk9nTj71t8ZHDremoxjtoLY=; b=Sygq4RLfUraps37/uYY4KFU5MiiVzjMuO/sPlA2JUQ5oL9QZP+IaFWmgAXx7HCENO8 JqRvE1mINt0SP/yQ6VLEHQQb3sUuWyeXRyY3XUBs0Ktk4QTtssgOdWlDoGoo80WShTGN gbvRQU4rSoCyRlrbip0zf+JWPEuA8d0OvDGfY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=eKafHTsW1NaB6I4H9HQVYk9nTj71t8ZHDremoxjtoLY=; b=GK8eD5lZQeI5L+c/uzlEgaS6TS8CGY90Mcb9EFJQrbxaxtbn92IlXoiPCgHCwCO6AE 16nvBoc7qMeEkwUILkgDTSuE/Pnaks+my7weH3X2WQdHoumrAga7ve5ybBa4TPbtJW0w +DTYVgD172lijTZ9dYO7Ck+EpZVV1TFyIaDU3Bsjg1Jp4WA5dErEj8+LyMopAChky8GI 9kn+hVjdEkcGIXKN5x4SF3wtpoLZ/kw5sUDrdTeHg3uOZfgP9qh6MvUllFgXJj3TAljg zH7zN9Tvx7yk1mUJrUpN6b4k+Z8IQZj6w9KnnISUHq5ymKhL44J8MyYwXqHK3dEHgYTl rGew== X-Gm-Message-State: AKGB3mKhzzeHtxejStWO7GHvUczjNj5g2lNhHOZatJBEBVd6EI0gaRC2 ErMJgGBp4vaXnQahZ6yinL7bjaSNA6s= X-Received: by 10.84.234.198 with SMTP id i6mr416248plt.384.1513617505757; Mon, 18 Dec 2017 09:18:25 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.24 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:24 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:48 -0800 Message-Id: <20171218171758.16964-17-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::244 Subject: [Qemu-devel] [PATCH v7 16/26] tcg/i386: Add vector operations/expansions for shift/cmp/interleave X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- tcg/i386/tcg-target.h | 6 +- tcg/i386/tcg-target.opc.h | 7 + tcg/i386/tcg-target.inc.c | 595 +++++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 598 insertions(+), 10 deletions(-) -- 2.14.3 diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h index 46c4dca7be..60d3684750 100644 --- a/tcg/i386/tcg-target.h +++ b/tcg/i386/tcg-target.h @@ -177,13 +177,13 @@ extern bool have_avx2; #define TCG_TARGET_HAS_orc_vec 0 #define TCG_TARGET_HAS_not_vec 0 #define TCG_TARGET_HAS_neg_vec 0 -#define TCG_TARGET_HAS_shi_vec 0 +#define TCG_TARGET_HAS_shi_vec 1 #define TCG_TARGET_HAS_shs_vec 0 #define TCG_TARGET_HAS_shv_vec 0 -#define TCG_TARGET_HAS_zip_vec 0 +#define TCG_TARGET_HAS_zip_vec 1 #define TCG_TARGET_HAS_uzp_vec 0 #define TCG_TARGET_HAS_trn_vec 0 -#define TCG_TARGET_HAS_cmp_vec 0 +#define TCG_TARGET_HAS_cmp_vec 1 #define TCG_TARGET_deposit_i32_valid(ofs, len) \ (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \ diff --git a/tcg/i386/tcg-target.opc.h b/tcg/i386/tcg-target.opc.h index 4816a6c3d4..77125ef818 100644 --- a/tcg/i386/tcg-target.opc.h +++ b/tcg/i386/tcg-target.opc.h @@ -1,3 +1,10 @@ /* Target-specific opcodes for host vector expansion. These will be emitted by tcg_expand_vec_op. For those familiar with GCC internals, consider these to be UNSPEC with names. */ + +DEF(x86_shufps_vec, 1, 2, 1, IMPLVEC) +DEF(x86_vpblendvb_vec, 1, 3, 0, IMPLVEC) +DEF(x86_blend_vec, 1, 2, 1, IMPLVEC) +DEF(x86_packss_vec, 1, 2, 0, IMPLVEC) +DEF(x86_packus_vec, 1, 2, 0, IMPLVEC) +DEF(x86_psrldq_vec, 1, 1, 1, IMPLVEC) diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c index 062cf16607..694d9e5cb5 100644 --- a/tcg/i386/tcg-target.inc.c +++ b/tcg/i386/tcg-target.inc.c @@ -324,6 +324,7 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type, # define P_REXB_RM 0 # define P_GS 0 #endif +#define P_EXT3A 0x10000 /* 0x0f 0x3a opcode prefix */ #define P_SIMDF3 0x20000 /* 0xf3 opcode prefix */ #define P_SIMDF2 0x40000 /* 0xf2 opcode prefix */ #define P_VEXL 0x80000 /* Set VEX.L = 1 */ @@ -333,6 +334,7 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type, #define OPC_ARITH_GvEv (0x03) /* ... plus (ARITH_FOO << 3) */ #define OPC_ANDN (0xf2 | P_EXT38) #define OPC_ADD_GvEv (OPC_ARITH_GvEv | (ARITH_ADD << 3)) +#define OPC_BLENDPS (0x0c | P_EXT3A | P_DATA16) #define OPC_BSF (0xbc | P_EXT) #define OPC_BSR (0xbd | P_EXT) #define OPC_BSWAP (0xc8 | P_EXT) @@ -372,15 +374,33 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type, #define OPC_MOVSLQ (0x63 | P_REXW) #define OPC_MOVZBL (0xb6 | P_EXT) #define OPC_MOVZWL (0xb7 | P_EXT) +#define OPC_PACKSSDW (0x6b | P_EXT | P_DATA16) +#define OPC_PACKSSWB (0x63 | P_EXT | P_DATA16) +#define OPC_PACKUSDW (0x2b | P_EXT38 | P_DATA16) +#define OPC_PACKUSWB (0x67 | P_EXT | P_DATA16) #define OPC_PADDB (0xfc | P_EXT | P_DATA16) #define OPC_PADDW (0xfd | P_EXT | P_DATA16) #define OPC_PADDD (0xfe | P_EXT | P_DATA16) #define OPC_PADDQ (0xd4 | P_EXT | P_DATA16) #define OPC_PAND (0xdb | P_EXT | P_DATA16) #define OPC_PANDN (0xdf | P_EXT | P_DATA16) +#define OPC_PBLENDW (0x0e | P_EXT3A | P_DATA16) #define OPC_PCMPEQB (0x74 | P_EXT | P_DATA16) +#define OPC_PCMPEQW (0x75 | P_EXT | P_DATA16) +#define OPC_PCMPEQD (0x76 | P_EXT | P_DATA16) +#define OPC_PCMPEQQ (0x29 | P_EXT38 | P_DATA16) +#define OPC_PCMPGTB (0x64 | P_EXT | P_DATA16) +#define OPC_PCMPGTW (0x65 | P_EXT | P_DATA16) +#define OPC_PCMPGTD (0x66 | P_EXT | P_DATA16) +#define OPC_PCMPGTQ (0x37 | P_EXT38 | P_DATA16) #define OPC_POR (0xeb | P_EXT | P_DATA16) +#define OPC_PSHUFB (0x00 | P_EXT38 | P_DATA16) #define OPC_PSHUFD (0x70 | P_EXT | P_DATA16) +#define OPC_PSHUFLW (0x70 | P_EXT | P_SIMDF2) +#define OPC_PSHUFHW (0x70 | P_EXT | P_SIMDF3) +#define OPC_PSHIFTW_Ib (0x71 | P_EXT | P_DATA16) /* /2 /6 /4 */ +#define OPC_PSHIFTD_Ib (0x72 | P_EXT | P_DATA16) /* /2 /6 /4 */ +#define OPC_PSHIFTQ_Ib (0x73 | P_EXT | P_DATA16) /* /2 /6 /4 */ #define OPC_PSUBB (0xf8 | P_EXT | P_DATA16) #define OPC_PSUBW (0xf9 | P_EXT | P_DATA16) #define OPC_PSUBD (0xfa | P_EXT | P_DATA16) @@ -389,6 +409,10 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type, #define OPC_PUNPCKLWD (0x61 | P_EXT | P_DATA16) #define OPC_PUNPCKLDQ (0x62 | P_EXT | P_DATA16) #define OPC_PUNPCKLQDQ (0x6c | P_EXT | P_DATA16) +#define OPC_PUNPCKHBW (0x68 | P_EXT | P_DATA16) +#define OPC_PUNPCKHWD (0x69 | P_EXT | P_DATA16) +#define OPC_PUNPCKHDQ (0x6a | P_EXT | P_DATA16) +#define OPC_PUNPCKHQDQ (0x6d | P_EXT | P_DATA16) #define OPC_PXOR (0xef | P_EXT | P_DATA16) #define OPC_POP_r32 (0x58) #define OPC_POPCNT (0xb8 | P_EXT | P_SIMDF3) @@ -401,19 +425,26 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type, #define OPC_SHIFT_Ib (0xc1) #define OPC_SHIFT_cl (0xd3) #define OPC_SARX (0xf7 | P_EXT38 | P_SIMDF3) +#define OPC_SHUFPS (0xc6 | P_EXT) #define OPC_SHLX (0xf7 | P_EXT38 | P_DATA16) #define OPC_SHRX (0xf7 | P_EXT38 | P_SIMDF2) #define OPC_TESTL (0x85) #define OPC_TZCNT (0xbc | P_EXT | P_SIMDF3) +#define OPC_UD2 (0x0b | P_EXT) +#define OPC_VPBLENDD (0x02 | P_EXT3A | P_DATA16) +#define OPC_VPBLENDVB (0x4c | P_EXT3A | P_DATA16) #define OPC_VPBROADCASTB (0x78 | P_EXT38 | P_DATA16) #define OPC_VPBROADCASTW (0x79 | P_EXT38 | P_DATA16) #define OPC_VPBROADCASTD (0x58 | P_EXT38 | P_DATA16) #define OPC_VPBROADCASTQ (0x59 | P_EXT38 | P_DATA16) +#define OPC_VPERMQ (0x00 | P_EXT3A | P_DATA16 | P_REXW) +#define OPC_VPERM2I128 (0x46 | P_EXT3A | P_DATA16 | P_VEXL) #define OPC_VZEROUPPER (0x77 | P_EXT) #define OPC_XCHG_ax_r32 (0x90) #define OPC_GRP3_Ev (0xf7) #define OPC_GRP5 (0xff) +#define OPC_GRP14 (0x73 | P_EXT | P_DATA16) /* Group 1 opcode extensions for 0x80-0x83. These are also used as modifiers for OPC_ARITH. */ @@ -519,10 +550,12 @@ static void tcg_out_opc(TCGContext *s, int opc, int r, int rm, int x) tcg_out8(s, (uint8_t)(rex | 0x40)); } - if (opc & (P_EXT | P_EXT38)) { + if (opc & (P_EXT | P_EXT38 | P_EXT3A)) { tcg_out8(s, 0x0f); if (opc & P_EXT38) { tcg_out8(s, 0x38); + } else if (opc & P_EXT3A) { + tcg_out8(s, 0x3a); } } @@ -539,10 +572,12 @@ static void tcg_out_opc(TCGContext *s, int opc) } else if (opc & P_SIMDF2) { tcg_out8(s, 0xf2); } - if (opc & (P_EXT | P_EXT38)) { + if (opc & (P_EXT | P_EXT38 | P_EXT3A)) { tcg_out8(s, 0x0f); if (opc & P_EXT38) { tcg_out8(s, 0x38); + } else if (opc & P_EXT3A) { + tcg_out8(s, 0x3a); } } tcg_out8(s, opc); @@ -566,7 +601,7 @@ static void tcg_out_vex_opc(TCGContext *s, int opc, int r, int v, /* Use the two byte form if possible, which cannot encode VEX.W, VEX.B, VEX.X, or an m-mmmm field other than P_EXT. */ - if ((opc & (P_EXT | P_EXT38 | P_REXW)) == P_EXT + if ((opc & (P_EXT | P_EXT38 | P_EXT3A | P_REXW)) == P_EXT && ((rm | index) & 8) == 0) { /* Two byte VEX prefix. */ tcg_out8(s, 0xc5); @@ -577,7 +612,9 @@ static void tcg_out_vex_opc(TCGContext *s, int opc, int r, int v, tcg_out8(s, 0xc4); /* VEX.m-mmmm */ - if (opc & P_EXT38) { + if (opc & P_EXT3A) { + tmp = 3; + } else if (opc & P_EXT38) { tmp = 2; } else if (opc & P_EXT) { tmp = 1; @@ -2638,9 +2675,24 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static int const sub_insn[4] = { OPC_PSUBB, OPC_PSUBW, OPC_PSUBD, OPC_PSUBQ }; + static int const shift_imm_insn[4] = { + OPC_UD2, OPC_PSHIFTW_Ib, OPC_PSHIFTD_Ib, OPC_PSHIFTQ_Ib + }; + static int const cmpeq_insn[4] = { + OPC_PCMPEQB, OPC_PCMPEQW, OPC_PCMPEQD, OPC_PCMPEQQ + }; + static int const cmpgt_insn[4] = { + OPC_PCMPGTB, OPC_PCMPGTW, OPC_PCMPGTD, OPC_PCMPGTQ + }; + static int const punpckl_insn[4] = { + OPC_PUNPCKLBW, OPC_PUNPCKLWD, OPC_PUNPCKLDQ, OPC_PUNPCKLQDQ + }; + static int const punpckh_insn[4] = { + OPC_PUNPCKHBW, OPC_PUNPCKHWD, OPC_PUNPCKHDQ, OPC_PUNPCKHQDQ + }; TCGType type = vecl + TCG_TYPE_V64; - int insn; + int insn, sub; TCGArg a0, a1, a2; a0 = args[0]; @@ -2662,6 +2714,31 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, goto gen_simd; case INDEX_op_xor_vec: insn = OPC_PXOR; + goto gen_simd; + case INDEX_op_zipl_vec: + insn = punpckl_insn[vece]; + goto gen_simd; + case INDEX_op_ziph_vec: + insn = punpckh_insn[vece]; + goto gen_simd; + case INDEX_op_x86_packss_vec: + if (vece == MO_8) { + insn = OPC_PACKSSWB; + } else if (vece == MO_16) { + insn = OPC_PACKSSDW; + } else { + g_assert_not_reached(); + } + goto gen_simd; + case INDEX_op_x86_packus_vec: + if (vece == MO_8) { + insn = OPC_PACKUSWB; + } else if (vece == MO_16) { + insn = OPC_PACKUSDW; + } else { + g_assert_not_reached(); + } + goto gen_simd; gen_simd: if (type == TCG_TYPE_V256) { insn |= P_VEXL; @@ -2669,6 +2746,17 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, tcg_out_vex_modrm(s, insn, a0, a1, a2); break; + case INDEX_op_cmp_vec: + sub = args[3]; + if (sub == TCG_COND_EQ) { + insn = cmpeq_insn[vece]; + } else if (sub == TCG_COND_GT) { + insn = cmpgt_insn[vece]; + } else { + g_assert_not_reached(); + } + goto gen_simd; + case INDEX_op_andc_vec: insn = OPC_PANDN; if (type == TCG_TYPE_V256) { @@ -2677,6 +2765,25 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, tcg_out_vex_modrm(s, insn, a0, a2, a1); break; + case INDEX_op_shli_vec: + sub = 6; + goto gen_shift; + case INDEX_op_shri_vec: + sub = 2; + goto gen_shift; + case INDEX_op_sari_vec: + tcg_debug_assert(vece != MO_64); + sub = 4; + gen_shift: + tcg_debug_assert(vece != MO_8); + insn = shift_imm_insn[vece]; + if (type == TCG_TYPE_V256) { + insn |= P_VEXL; + } + tcg_out_vex_modrm(s, insn, sub, a0, a1); + tcg_out8(s, a2); + break; + case INDEX_op_ld_vec: tcg_out_ld(s, type, a0, a1, a2); break; @@ -2690,6 +2797,42 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, tcg_out_dup_vec(s, type, vece, a0, a1); break; + case INDEX_op_x86_shufps_vec: + insn = OPC_SHUFPS; + sub = args[3]; + goto gen_simd_imm8; + case INDEX_op_x86_blend_vec: + if (vece == MO_16) { + insn = OPC_PBLENDW; + } else if (vece == MO_32) { + insn = (have_avx2 ? OPC_VPBLENDD : OPC_BLENDPS); + } else { + g_assert_not_reached(); + } + sub = args[3]; + goto gen_simd_imm8; + gen_simd_imm8: + if (type == TCG_TYPE_V256) { + insn |= P_VEXL; + } + tcg_out_vex_modrm(s, insn, a0, a1, a2); + tcg_out8(s, sub); + break; + + case INDEX_op_x86_vpblendvb_vec: + insn = OPC_VPBLENDVB; + if (type == TCG_TYPE_V256) { + insn |= P_VEXL; + } + tcg_out_vex_modrm(s, insn, a0, a1, a2); + tcg_out8(s, args[3] << 4); + break; + + case INDEX_op_x86_psrldq_vec: + tcg_out_vex_modrm(s, OPC_GRP14, 3, a0, a1); + tcg_out8(s, a2); + break; + default: g_assert_not_reached(); } @@ -2720,6 +2863,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) = { .args_ct_str = { "L", "L", "L", "L" } }; static const TCGTargetOpDef x_x = { .args_ct_str = { "x", "x" } }; static const TCGTargetOpDef x_x_x = { .args_ct_str = { "x", "x", "x" } }; + static const TCGTargetOpDef x_x_x_x + = { .args_ct_str = { "x", "x", "x", "x" } }; static const TCGTargetOpDef x_r = { .args_ct_str = { "x", "r" } }; switch (op) { @@ -2932,9 +3077,22 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) case INDEX_op_or_vec: case INDEX_op_xor_vec: case INDEX_op_andc_vec: + case INDEX_op_cmp_vec: + case INDEX_op_zipl_vec: + case INDEX_op_ziph_vec: + case INDEX_op_x86_shufps_vec: + case INDEX_op_x86_blend_vec: + case INDEX_op_x86_packss_vec: + case INDEX_op_x86_packus_vec: return &x_x_x; case INDEX_op_dup_vec: + case INDEX_op_shli_vec: + case INDEX_op_shri_vec: + case INDEX_op_sari_vec: + case INDEX_op_x86_psrldq_vec: return &x_x; + case INDEX_op_x86_vpblendvb_vec: + return &x_x_x_x; default: break; @@ -2951,16 +3109,439 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_or_vec: case INDEX_op_xor_vec: case INDEX_op_andc_vec: - return true; + return 1; + case INDEX_op_cmp_vec: + return -1; + + case INDEX_op_shli_vec: + case INDEX_op_shri_vec: + /* We must expand the operation for MO_8. */ + return vece == MO_8 ? -1 : 1; + + case INDEX_op_sari_vec: + /* We must expand the operation for MO_8. */ + if (vece == MO_8) { + return -1; + } + /* We can emulate this for MO_64, but it does not pay off + unless we're producing at least 4 values. */ + if (vece == MO_64) { + return type >= TCG_TYPE_V256 ? -1 : 0; + } + return 1; + + case INDEX_op_zipl_vec: + /* We could support v256, but with 3 insns per opcode. + It is better to expand with v128 instead. */ + return type <= TCG_TYPE_V128; + case INDEX_op_ziph_vec: + if (type == TCG_TYPE_V64) { + return -1; + } + return type == TCG_TYPE_V128; + + case INDEX_op_uzpe_vec: + case INDEX_op_uzpo_vec: + case INDEX_op_trne_vec: + case INDEX_op_trno_vec: + /* ??? Not implemented for V256. */ + return -(type <= TCG_TYPE_V128); default: - return false; + return 0; } } void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece, TCGArg a0, ...) { + va_list va; + TCGArg a1, a2; + TCGv_vec v0, v1, v2, t1, t2; + + va_start(va, a0); + v0 = temp_tcgv_vec(arg_temp(a0)); + + switch (opc) { + case INDEX_op_shli_vec: + case INDEX_op_shri_vec: + tcg_debug_assert(vece == MO_8); + a1 = va_arg(va, TCGArg); + a2 = va_arg(va, TCGArg); + /* Unpack to W, shift, and repack. Tricky bits: + (1) Use punpck*bw x,x to produce DDCCBBAA, + i.e. duplicate in other half of the 16-bit lane. + (2) For right-shift, add 8 so that the high half of + the lane becomes zero. For left-shift, we must + shift up and down again. + (3) Step 2 leaves high half zero such that PACKUSWB + (pack with unsigned saturation) does not modify + the quantity. */ + t1 = tcg_temp_new_vec(type); + t2 = tcg_temp_new_vec(type); + vec_gen_3(INDEX_op_zipl_vec, type, MO_8, tcgv_vec_arg(t1), a1, a1); + vec_gen_3(INDEX_op_ziph_vec, type, MO_8, tcgv_vec_arg(t2), a1, a1); + if (opc == INDEX_op_shri_vec) { + vec_gen_3(INDEX_op_shri_vec, type, MO_16, + tcgv_vec_arg(t1), tcgv_vec_arg(t1), a2 + 8); + vec_gen_3(INDEX_op_shri_vec, type, MO_16, + tcgv_vec_arg(t2), tcgv_vec_arg(t2), a2 + 8); + } else { + vec_gen_3(INDEX_op_shli_vec, type, MO_16, + tcgv_vec_arg(t1), tcgv_vec_arg(t1), a2 + 8); + vec_gen_3(INDEX_op_shli_vec, type, MO_16, + tcgv_vec_arg(t2), tcgv_vec_arg(t2), a2 + 8); + vec_gen_3(INDEX_op_shri_vec, type, MO_16, + tcgv_vec_arg(t1), tcgv_vec_arg(t1), 8); + vec_gen_3(INDEX_op_shri_vec, type, MO_16, + tcgv_vec_arg(t2), tcgv_vec_arg(t2), 8); + } + vec_gen_3(INDEX_op_x86_packus_vec, type, MO_8, + a0, tcgv_vec_arg(t1), tcgv_vec_arg(t2)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + break; + + case INDEX_op_sari_vec: + a1 = va_arg(va, TCGArg); + a2 = va_arg(va, TCGArg); + if (vece == MO_8) { + /* Unpack to W, shift, and repack, as above. */ + t1 = tcg_temp_new_vec(type); + t2 = tcg_temp_new_vec(type); + vec_gen_3(INDEX_op_zipl_vec, type, MO_8, tcgv_vec_arg(t1), a1, a1); + vec_gen_3(INDEX_op_ziph_vec, type, MO_8, tcgv_vec_arg(t2), a1, a1); + vec_gen_3(INDEX_op_sari_vec, type, MO_16, + tcgv_vec_arg(t1), tcgv_vec_arg(t1), a2 + 8); + vec_gen_3(INDEX_op_sari_vec, type, MO_16, + tcgv_vec_arg(t2), tcgv_vec_arg(t2), a2 + 8); + vec_gen_3(INDEX_op_x86_packss_vec, type, MO_8, + a0, tcgv_vec_arg(t1), tcgv_vec_arg(t2)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + break; + } + tcg_debug_assert(vece == MO_64); + /* MO_64: If the shift is <= 32, we can emulate the sign extend by + performing an arithmetic 32-bit shift and overwriting the high + half of the result (note that the ISA says shift of 32 is valid). */ + if (a2 <= 32) { + t1 = tcg_temp_new_vec(type); + vec_gen_3(INDEX_op_sari_vec, type, MO_32, tcgv_vec_arg(t1), a1, a2); + vec_gen_3(INDEX_op_sari_vec, type, MO_64, a0, a1, a2); + vec_gen_4(INDEX_op_x86_blend_vec, type, MO_32, + a0, a0, tcgv_vec_arg(t1), 0xaa); + tcg_temp_free_vec(t1); + break; + } + /* Otherwise we will need to use a compare vs 0 to produce the + sign-extend, shift and merge. */ + t1 = tcg_temp_new_vec(type); + t2 = tcg_const_zeros_vec(type); + vec_gen_4(INDEX_op_cmp_vec, type, MO_64, + tcgv_vec_arg(t1), tcgv_vec_arg(t2), a1, TCG_COND_GT); + tcg_temp_free_vec(t2); + vec_gen_3(INDEX_op_shri_vec, type, MO_64, a0, a1, a2); + vec_gen_3(INDEX_op_shli_vec, type, MO_64, + tcgv_vec_arg(t1), tcgv_vec_arg(t1), 64 - a2); + vec_gen_3(INDEX_op_or_vec, type, MO_64, a0, a0, tcgv_vec_arg(t1)); + tcg_temp_free_vec(t1); + break; + + case INDEX_op_ziph_vec: + tcg_debug_assert(type == TCG_TYPE_V64); + a1 = va_arg(va, TCGArg); + a2 = va_arg(va, TCGArg); + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V128, vece, a0, a1, a2); + vec_gen_3(INDEX_op_x86_psrldq_vec, TCG_TYPE_V128, MO_64, a0, a0, 8); + break; + + case INDEX_op_uzpe_vec: + a1 = va_arg(va, TCGArg); + a2 = va_arg(va, TCGArg); + v1 = temp_tcgv_vec(arg_temp(a1)); + v2 = temp_tcgv_vec(arg_temp(a2)); + + if (type == TCG_TYPE_V128) { + switch (vece) { + case MO_8: + t1 = tcg_temp_new_vec(type); + t2 = tcg_temp_new_vec(type); + tcg_gen_dup16i_vec(t2, 0x00ff); + tcg_gen_and_vec(MO_16, t1, v2, t2); + tcg_gen_and_vec(MO_16, v0, v1, t2); + vec_gen_3(INDEX_op_x86_packus_vec, type, MO_8, + a0, a0, tcgv_vec_arg(t1)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + break; + case MO_16: + t1 = tcg_temp_new_vec(type); + t2 = tcg_temp_new_vec(type); + tcg_gen_dup32i_vec(t2, 0x0000ffff); + tcg_gen_and_vec(MO_32, t1, v2, t2); + tcg_gen_and_vec(MO_32, v0, v1, t2); + vec_gen_3(INDEX_op_x86_packus_vec, type, MO_16, + a0, a0, tcgv_vec_arg(t1)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + break; + case MO_32: + vec_gen_4(INDEX_op_x86_shufps_vec, type, MO_32, + a0, a1, a2, 0x88); + break; + case MO_64: + tcg_gen_zipl_vec(vece, v0, v1, v2); + break; + default: + g_assert_not_reached(); + } + } else { + tcg_debug_assert(type == TCG_TYPE_V64); + switch (vece) { + case MO_8: + t1 = tcg_temp_new_vec(TCG_TYPE_V128); + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V128, MO_64, + tcgv_vec_arg(t1), a1, a2); + t2 = tcg_temp_new_vec(TCG_TYPE_V128); + tcg_gen_dup16i_vec(t2, 0x00ff); + tcg_gen_and_vec(MO_16, t1, t1, t2); + vec_gen_3(INDEX_op_x86_packus_vec, TCG_TYPE_V128, MO_8, + a0, tcgv_vec_arg(t1), tcgv_vec_arg(t1)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + break; + case MO_16: + t1 = tcg_temp_new_vec(TCG_TYPE_V128); + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V128, MO_64, + tcgv_vec_arg(t1), a1, a2); + t2 = tcg_temp_new_vec(TCG_TYPE_V128); + tcg_gen_dup32i_vec(t2, 0x0000ffff); + tcg_gen_and_vec(MO_32, t1, t1, t2); + vec_gen_3(INDEX_op_x86_packus_vec, TCG_TYPE_V128, MO_16, + a0, tcgv_vec_arg(t1), tcgv_vec_arg(t1)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + break; + case MO_32: + tcg_gen_zipl_vec(vece, v0, v1, v2); + break; + default: + g_assert_not_reached(); + } + } + break; + + case INDEX_op_uzpo_vec: + a1 = va_arg(va, TCGArg); + a2 = va_arg(va, TCGArg); + v1 = temp_tcgv_vec(arg_temp(a1)); + v2 = temp_tcgv_vec(arg_temp(a2)); + + if (type == TCG_TYPE_V128) { + switch (vece) { + case MO_8: + t1 = tcg_temp_new_vec(type); + tcg_gen_shri_vec(MO_16, t1, v2, 8); + tcg_gen_shri_vec(MO_16, v0, v1, 8); + vec_gen_3(INDEX_op_x86_packus_vec, type, MO_8, + a0, a0, tcgv_vec_arg(t1)); + tcg_temp_free_vec(t1); + break; + case MO_16: + t1 = tcg_temp_new_vec(type); + tcg_gen_shri_vec(MO_32, t1, v2, 16); + tcg_gen_shri_vec(MO_32, v0, v1, 16); + vec_gen_3(INDEX_op_x86_packus_vec, type, MO_16, + a0, a0, tcgv_vec_arg(t1)); + tcg_temp_free_vec(t1); + break; + case MO_32: + vec_gen_4(INDEX_op_x86_shufps_vec, type, MO_32, + a0, a1, a2, 0xdd); + break; + case MO_64: + tcg_gen_ziph_vec(vece, v0, v1, v2); + break; + default: + g_assert_not_reached(); + } + } else { + switch (vece) { + case MO_8: + t1 = tcg_temp_new_vec(TCG_TYPE_V128); + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V128, MO_64, + tcgv_vec_arg(t1), a1, a2); + tcg_gen_shri_vec(MO_16, t1, t1, 8); + vec_gen_3(INDEX_op_x86_packus_vec, TCG_TYPE_V128, MO_8, + a0, tcgv_vec_arg(t1), tcgv_vec_arg(t1)); + tcg_temp_free_vec(t1); + break; + case MO_16: + t1 = tcg_temp_new_vec(TCG_TYPE_V128); + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V128, MO_64, + tcgv_vec_arg(t1), a1, a2); + tcg_gen_shri_vec(MO_32, t1, t1, 16); + vec_gen_3(INDEX_op_x86_packus_vec, TCG_TYPE_V128, MO_16, + a0, tcgv_vec_arg(t1), tcgv_vec_arg(t1)); + tcg_temp_free_vec(t1); + break; + case MO_32: + tcg_gen_ziph_vec(vece, v0, v1, v2); + break; + default: + g_assert_not_reached(); + } + } + break; + + case INDEX_op_trne_vec: + a1 = va_arg(va, TCGArg); + a2 = va_arg(va, TCGArg); + switch (vece) { + case MO_8: + t1 = tcg_temp_new_vec(type); + t2 = tcg_temp_new_vec(type); + vec_gen_3(INDEX_op_shli_vec, type, MO_16, + tcgv_vec_arg(t1), a2, 8); + tcg_gen_dup16i_vec(t2, 0xff00); + vec_gen_4(INDEX_op_x86_vpblendvb_vec, type, MO_8, + a0, a1, tcgv_vec_arg(t1), tcgv_vec_arg(t2)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + break; + case MO_16: + t1 = tcg_temp_new_vec(type); + vec_gen_3(INDEX_op_shli_vec, type, MO_32, + tcgv_vec_arg(t1), a2, 16); + vec_gen_4(INDEX_op_x86_blend_vec, type, MO_16, + a0, a1, tcgv_vec_arg(t1), 0xaa); + tcg_temp_free_vec(t1); + break; + case MO_32: + t1 = tcg_temp_new_vec(type); + vec_gen_3(INDEX_op_shli_vec, type, MO_64, + tcgv_vec_arg(t1), a2, 32); + vec_gen_4(INDEX_op_x86_blend_vec, type, MO_32, + a0, a1, tcgv_vec_arg(t1), 0xaa); + tcg_temp_free_vec(t1); + break; + case MO_64: + vec_gen_3(INDEX_op_zipl_vec, type, MO_64, a0, a1, a2); + break; + default: + g_assert_not_reached(); + } + break; + + case INDEX_op_trno_vec: + a1 = va_arg(va, TCGArg); + a2 = va_arg(va, TCGArg); + switch (vece) { + case MO_8: + t1 = tcg_temp_new_vec(type); + t2 = tcg_temp_new_vec(type); + vec_gen_3(INDEX_op_shri_vec, type, MO_16, + tcgv_vec_arg(t1), a1, 8); + tcg_gen_dup16i_vec(t2, 0xff00); + vec_gen_4(INDEX_op_x86_vpblendvb_vec, type, MO_8, + a0, tcgv_vec_arg(t1), a2, tcgv_vec_arg(t2)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + break; + case MO_16: + t1 = tcg_temp_new_vec(type); + vec_gen_3(INDEX_op_shri_vec, type, MO_32, + tcgv_vec_arg(t1), a1, 16); + vec_gen_4(INDEX_op_x86_blend_vec, type, MO_16, + a0, tcgv_vec_arg(t1), a2, 0xaa); + tcg_temp_free_vec(t1); + break; + case MO_32: + t1 = tcg_temp_new_vec(type); + vec_gen_3(INDEX_op_shri_vec, type, MO_64, + tcgv_vec_arg(t1), a1, 32); + vec_gen_4(INDEX_op_x86_blend_vec, type, MO_32, + a0, tcgv_vec_arg(t1), a2, 0xaa); + tcg_temp_free_vec(t1); + break; + case MO_64: + vec_gen_3(INDEX_op_ziph_vec, type, MO_64, a0, a1, a2); + break; + default: + g_assert_not_reached(); + } + break; + + case INDEX_op_cmp_vec: + { + enum { + NEED_SWAP = 1, + NEED_INV = 2, + NEED_BIAS = 4 + }; + static const uint8_t fixups[16] = { + [0 ... 15] = -1, + [TCG_COND_EQ] = 0, + [TCG_COND_NE] = NEED_INV, + [TCG_COND_GT] = 0, + [TCG_COND_LT] = NEED_SWAP, + [TCG_COND_LE] = NEED_INV, + [TCG_COND_GE] = NEED_SWAP | NEED_INV, + [TCG_COND_GTU] = NEED_BIAS, + [TCG_COND_LTU] = NEED_BIAS | NEED_SWAP, + [TCG_COND_LEU] = NEED_BIAS | NEED_INV, + [TCG_COND_GEU] = NEED_BIAS | NEED_SWAP | NEED_INV, + }; + + TCGCond cond; + uint8_t fixup; + + a1 = va_arg(va, TCGArg); + a2 = va_arg(va, TCGArg); + cond = va_arg(va, TCGArg); + fixup = fixups[cond & 15]; + tcg_debug_assert(fixup != 0xff); + + if (fixup & NEED_INV) { + cond = tcg_invert_cond(cond); + } + if (fixup & NEED_SWAP) { + TCGArg t; + t = a1, a1 = a2, a2 = t; + cond = tcg_swap_cond(cond); + } + + t1 = t2 = NULL; + if (fixup & NEED_BIAS) { + t1 = tcg_temp_new_vec(type); + t2 = tcg_temp_new_vec(type); + tcg_gen_dupi_vec(vece, t2, 1ull << ((8 << vece) - 1)); + tcg_gen_sub_vec(vece, t1, temp_tcgv_vec(arg_temp(a1)), t2); + tcg_gen_sub_vec(vece, t2, temp_tcgv_vec(arg_temp(a2)), t2); + a1 = tcgv_vec_arg(t1); + a2 = tcgv_vec_arg(t2); + cond = tcg_signed_cond(cond); + } + + tcg_debug_assert(cond == TCG_COND_EQ || cond == TCG_COND_GT); + vec_gen_4(INDEX_op_cmp_vec, type, vece, a0, a1, a2, cond); + + if (fixup & NEED_BIAS) { + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + } + if (fixup & NEED_INV) { + tcg_gen_not_vec(vece, v0, v0); + } + } + break; + + default: + break; + } + + va_end(va); } static const int tcg_target_callee_save_regs[] = { From patchwork Mon Dec 18 17:17:49 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122250 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3106521qgn; Mon, 18 Dec 2017 09:25:23 -0800 (PST) X-Google-Smtp-Source: ACJfBovwupDpxotrpjq32ixsSGV9xJ6MwyjMwVDKYlg32gKCcCqpMkVWWMkz8SOvIUnQ+rBmCrDT X-Received: by 10.129.4.82 with SMTP id 79mr355684ywe.415.1513617923722; Mon, 18 Dec 2017 09:25:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513617923; cv=none; d=google.com; s=arc-20160816; b=ZUDB4Bm+EiWCphANBYLfWar9Yhefbrps2bny2ewE4rbIrqipNjCT7kWcQKihb80iwa NtxQ4gDlVIz3K+BJ0iz2jHJqiZZ26Ydr2qCcIa+pbvhynd9e8hMaWLw363QGhmhJVU4i lsyFagJVoD78ZkQAc3FhNq0prVatMUtxSRvD4tXw251lNr4WCr768XG6iCCkuA/e0z8T Yy/lIIkz8/TsypvnawYK2UGTLRzqU3G7nB1NonrXAVkBLZhJSrwUsH4c/4kzPdRPEuvV QzhOPuDZ+YrmP7QCKXFaxW2hIKXw1v7qOt5grgFvvWZ76gdMv43po3KjnV1Gxjx2hZyc atTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=Hzl5Yt92Avp6R8g0pdxXcTQ2/JKjXhwXb3U4KdeBdhI=; b=fu3jeJ+ZgqaZyz14tcBsWaDzkgOuX8OJyM4mpyf5sUXacf68oqkLtqjP/fj/ol8IEd bw7iEa/uQqgNLnTKiFtZWBBcLsTERa61Rnn4jCSsmfVdGJkDqN3eR2qVbQzbTpCtJXse 8qAgpl1eguMlQtu5f3CC+O7gxZeQCiiDk2WVvjQeeuTEpQDxhhTAxUXL6ZpNhL4cQ6O8 2q3dAPSQEBJCQdr/TIrKV6NmzeC1J+HHtM0RLP055PDGzsC3RDhIR37fPuyAGAAP6WwO 4VZo1ydCOl1qflscukKW0Y0bN+aijDScEq02eJypmFT8NuxRfodenWv4tEb8VCVVZAED sb+A== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=DRDnKtEF; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id p205si2471004ywp.110.2017.12.18.09.25.23 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:25:23 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=DRDnKtEF; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58570 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzAJ-0006Hz-7l for patch@linaro.org; Mon, 18 Dec 2017 12:25:23 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37400) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3h-0000by-SX for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:36 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3d-0001IX-5u for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:33 -0500 Received: from mail-pl0-x242.google.com ([2607:f8b0:400e:c01::242]:36440) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3c-0001Ha-Qp for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:29 -0500 Received: by mail-pl0-x242.google.com with SMTP id b12so5186413plm.3 for ; Mon, 18 Dec 2017 09:18:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Hzl5Yt92Avp6R8g0pdxXcTQ2/JKjXhwXb3U4KdeBdhI=; b=DRDnKtEFnez+9At4vOpeh1e9B9bgFJMyNHnrQedhFCD6mfJlyMIBfZMY1lEkasqnzM 8B0ZK4qlFJr1ondV/VhD1HKuNjr5hf0kTZa/0J6goupL9IhPQhaRvSFuf6QmGilRAdZu m/DzBy9Tm1p21r+OhH7O4Y5ez4thNQjxEN1M0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Hzl5Yt92Avp6R8g0pdxXcTQ2/JKjXhwXb3U4KdeBdhI=; b=PoUcCl0DlfM9FkCJSO+iy20jeiGwNK4WegsyeMR07TAyXYtTrm2ovq3WYhHnCqpu18 AFw25WdEvPPylH1XcPBuNKCuoB2Z4sltu46zdtN9JBWAYLias85JwZ1s33utvpwiC8h4 8ZsGacVsxobXBJn3qIDXlwp8o07seywk+b0BNO2ZQjmyk3YIhZyEubgJ19riM/IFDN8t gJTMKi9FL6SDQOdYvxigGn2NaaIOIB+UkOvK7qyCV+SvnzBPyJbLF5Z9q5lyRVw3mV87 S1EZKrWaYjABNwK0F9xkd5b5gs1DrMKpsSeI59oTemtdRcFpJAynGV47CXH5BtIEWAVY 1IEw== X-Gm-Message-State: AKGB3mIhN97iMGWLKTndGtCQGSP38WwpQfbyYAAPW+Z6MxSS64YCyI44 558Uac6LY3XTob0IaaY5rEUz3St6k5E= X-Received: by 10.159.233.135 with SMTP id bh7mr425107plb.52.1513617507517; Mon, 18 Dec 2017 09:18:27 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.25 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:26 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:49 -0800 Message-Id: <20171218171758.16964-18-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::242 Subject: [Qemu-devel] [PATCH v7 17/26] tcg: Add generic vector ops for multiplication X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- accel/tcg/tcg-runtime.h | 5 +++++ tcg/i386/tcg-target.h | 1 + tcg/tcg-op-gvec.h | 2 ++ tcg/tcg-op.h | 1 + tcg/tcg-opc.h | 1 + tcg/tcg.h | 1 + accel/tcg/tcg-runtime-gvec.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ tcg/tcg-op-gvec.c | 29 +++++++++++++++++++++++++++++ tcg/tcg-op-vec.c | 22 ++++++++++++++++++++++ tcg/tcg.c | 2 ++ tcg/README | 4 ++++ 11 files changed, 112 insertions(+) -- 2.14.3 diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h index 28abf30d76..c4a2e6b215 100644 --- a/accel/tcg/tcg-runtime.h +++ b/accel/tcg/tcg-runtime.h @@ -152,6 +152,11 @@ DEF_HELPER_FLAGS_4(gvec_sub16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_sub32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_sub64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_mul8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_mul16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_mul32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_mul64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_3(gvec_neg8, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(gvec_neg16, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(gvec_neg32, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h index 60d3684750..949d138c9d 100644 --- a/tcg/i386/tcg-target.h +++ b/tcg/i386/tcg-target.h @@ -184,6 +184,7 @@ extern bool have_avx2; #define TCG_TARGET_HAS_uzp_vec 0 #define TCG_TARGET_HAS_trn_vec 0 #define TCG_TARGET_HAS_cmp_vec 1 +#define TCG_TARGET_HAS_mul_vec 0 #define TCG_TARGET_deposit_i32_valid(ofs, len) \ (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \ diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h index 308bdc13b4..ad5e22e1bf 100644 --- a/tcg/tcg-op-gvec.h +++ b/tcg/tcg-op-gvec.h @@ -134,6 +134,8 @@ void tcg_gen_gvec_add(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t opsz, uint32_t clsz); void tcg_gen_gvec_sub(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_mul(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); void tcg_gen_gvec_and(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t opsz, uint32_t clsz); diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index b4f73c6048..3296a7baa5 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -920,6 +920,7 @@ void tcg_gen_movi_v128(TCGv_vec, uint64_t, uint64_t); void tcg_gen_movi_v256(TCGv_vec, uint64_t, uint64_t, uint64_t, uint64_t); void tcg_gen_add_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_sub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_mul_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_and_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_or_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_xor_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index d3fa014507..b21a30273c 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -220,6 +220,7 @@ DEF(st_vec, 0, 2, 1, IMPLVEC) DEF(add_vec, 1, 2, 0, IMPLVEC) DEF(sub_vec, 1, 2, 0, IMPLVEC) +DEF(mul_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_mul_vec)) DEF(neg_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_neg_vec)) DEF(and_vec, 1, 2, 0, IMPLVEC) diff --git a/tcg/tcg.h b/tcg/tcg.h index ceef8742b5..64078a2e92 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -185,6 +185,7 @@ typedef uint64_t TCGRegSet; #define TCG_TARGET_HAS_uzp_vec 0 #define TCG_TARGET_HAS_trn_vec 0 #define TCG_TARGET_HAS_cmp_vec 0 +#define TCG_TARGET_HAS_mul_vec 0 #else #define TCG_TARGET_MAYBE_vec 1 #endif diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c index e0cde3216f..9406ccd769 100644 --- a/accel/tcg/tcg-runtime-gvec.c +++ b/accel/tcg/tcg-runtime-gvec.c @@ -141,6 +141,50 @@ void HELPER(gvec_sub64)(void *d, void *a, void *b, uint32_t desc) clear_high(d, oprsz, desc); } +void HELPER(gvec_mul8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(vec8 *)(d + i) = *(vec8 *)(a + i) * *(vec8 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_mul16)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(vec16 *)(d + i) = *(vec16 *)(a + i) * *(vec16 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_mul32)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(vec32 *)(d + i) = *(vec32 *)(a + i) * *(vec32 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_mul64)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) * *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} + void HELPER(gvec_neg8)(void *d, void *a, uint32_t desc) { intptr_t oprsz = simd_oprsz(desc); diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index 0db48bc75a..e6d885da66 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -1108,6 +1108,35 @@ void tcg_gen_gvec_sub(unsigned vece, uint32_t dofs, uint32_t aofs, tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g[vece]); } +void tcg_gen_gvec_mul(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen3 g[4] = { + { .fniv = tcg_gen_mul_vec, + .fno = gen_helper_gvec_mul8, + .opc = INDEX_op_mul_vec, + .vece = MO_8 }, + { .fniv = tcg_gen_mul_vec, + .fno = gen_helper_gvec_mul16, + .opc = INDEX_op_mul_vec, + .vece = MO_16 }, + { .fni4 = tcg_gen_mul_i32, + .fniv = tcg_gen_mul_vec, + .fno = gen_helper_gvec_mul32, + .opc = INDEX_op_mul_vec, + .vece = MO_32 }, + { .fni8 = tcg_gen_mul_i64, + .fniv = tcg_gen_mul_vec, + .fno = gen_helper_gvec_mul64, + .opc = INDEX_op_mul_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g[vece]); +} + /* Perform a vector negation using normal negation and a mask. Compare gen_subv_mask above. */ static void gen_negv_mask(TCGv_i64 d, TCGv_i64 b, TCGv_i64 m) diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c index 2c636ebbd6..667e5095cb 100644 --- a/tcg/tcg-op-vec.c +++ b/tcg/tcg-op-vec.c @@ -503,3 +503,25 @@ void tcg_gen_cmp_vec(TCGCond cond, unsigned vece, tcg_expand_vec_op(INDEX_op_cmp_vec, type, vece, ri, ai, bi, cond); } } + +void tcg_gen_mul_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + TCGTemp *rt = tcgv_vec_temp(r); + TCGTemp *at = tcgv_vec_temp(a); + TCGTemp *bt = tcgv_vec_temp(b); + TCGArg ri = temp_arg(rt); + TCGArg ai = temp_arg(at); + TCGArg bi = temp_arg(bt); + TCGType type = rt->base_type; + int can; + + tcg_debug_assert(at->base_type == type); + tcg_debug_assert(bt->base_type == type); + can = tcg_can_emit_vec_op(INDEX_op_mul_vec, type, vece); + if (can > 0) { + vec_gen_3(INDEX_op_mul_vec, type, vece, ri, ai, bi); + } else { + tcg_debug_assert(can < 0); + tcg_expand_vec_op(INDEX_op_mul_vec, type, vece, ri, ai, bi); + } +} diff --git a/tcg/tcg.c b/tcg/tcg.c index a85547a6d2..5608391dca 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1404,6 +1404,8 @@ bool tcg_op_supported(TCGOpcode op) return have_vec && TCG_TARGET_HAS_andc_vec; case INDEX_op_orc_vec: return have_vec && TCG_TARGET_HAS_orc_vec; + case INDEX_op_mul_vec: + return have_vec && TCG_TARGET_HAS_mul_vec; case INDEX_op_shli_vec: case INDEX_op_shri_vec: case INDEX_op_sari_vec: diff --git a/tcg/README b/tcg/README index 18b6bbd8f1..17695ff7f6 100644 --- a/tcg/README +++ b/tcg/README @@ -547,6 +547,10 @@ E.g. VECL=1 -> 64 << 1 -> v128, and VECE=2 -> 1 << 2 -> i32. Similarly, v0 = v1 - v2. +* mul_vec v0, v1, v2 + + Similarly, v0 = v1 * v2. + * neg_vec v0, v1 Similarly, v0 = -v1. From patchwork Mon Dec 18 17:17:50 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122263 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3120234qgn; Mon, 18 Dec 2017 09:38:44 -0800 (PST) X-Google-Smtp-Source: ACJfBouh0LBXGdW2xmvD/gqm7e8Om4VGuyz+6rZHtprknVL8AZmw+vG7pE5ZNlTFgVWnfVEO2lna X-Received: by 10.37.30.67 with SMTP id e64mr462210ybe.103.1513618724752; Mon, 18 Dec 2017 09:38:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513618724; cv=none; d=google.com; s=arc-20160816; b=ZWZk/iO/ynrciMmm9wg9IG8Pt+9NE2xvw5EJ94u93CIEeipOl+w6g89exNbWujmmZl NZfbNDudRkb68M8wdAqCz26UnEFeaknMOSMoaUVDDslZasMPVB6lUmoRDqVE+r4917Fi tg0efoJzQKCKy5z7h82DJ0XONzJUDUH+dz186pszCGlO+Hajd3fi+VRTuMIx5lvpsSuB tg+MUQV5w2ikyMSL+ttG6uvDxVsQcEdVuVKsxrW0BUX9RbUn5/JIzQ4WMgABl479YSky E7zQaUuk7sSCMIZMI5JqmbqIWOHwG2IeUGj3kVr/YtsVvrYE1RLWGKDWrv0MnQ4oOWog p5xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=vnef9FRwNaBRvUohtcwSFNcDmh12xy5pNfZLhkw8F3c=; b=lkrPSlZH44h1AYzqjzk9YSpZ1H+NdmU7wGhCpclyxHdbwQJqHx2ohKOAMYni85iNKu SEhG/bLQgnozQnTvMhLHT/vytoUdtXgOO6R6di3bVoiRwnMhOHdqOvkA7PcGe9sbQgYA wu+U4ApBcWHEcwIK9vlZH6zxZIJmSXEkvdE0eQiOIYZE+aeB7v5U1hWeED4epG6UI1OW Rdi0p/Oyhy+FWggMWpp8qpkpO1cwxwr/jnhOlUqxTYGkN8MHc/Y6yyPw8r+TPxPv/Axm 72olOPo0Qu4b/Tg9WEuMKUxBe0qZvwgu4tfsfJ3ZcjvLdOkeDDbhSsmEv76lMAWKwl8r 6Zdg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=bgCqfb6H; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id p203si2525299ywe.501.2017.12.18.09.38.44 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:38:44 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=bgCqfb6H; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58776 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzNE-00088v-8j for patch@linaro.org; Mon, 18 Dec 2017 12:38:44 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37474) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3k-0000dR-Bi for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:37 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3e-0001KP-O9 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:36 -0500 Received: from mail-pl0-x243.google.com ([2607:f8b0:400e:c01::243]:43231) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3e-0001JR-DS for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:30 -0500 Received: by mail-pl0-x243.google.com with SMTP id z5so5199466plo.10 for ; Mon, 18 Dec 2017 09:18:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=vnef9FRwNaBRvUohtcwSFNcDmh12xy5pNfZLhkw8F3c=; b=bgCqfb6HAx8vxuSt2GNwaBYg5l/QzdNP79fHSKyp2Yko+27d8pMxj54VNGdv+1RcTd yVZFPxR5HS3VaZEEnloz+opLb38BJCG8jsDBtdPyEfzaeKQWc0Ne9jnhyQOf0Qslw2dQ HceV2+Gg2Oaz4iXdKDHrAjeGlS7CcR8wdxWxs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=vnef9FRwNaBRvUohtcwSFNcDmh12xy5pNfZLhkw8F3c=; b=raE4gnthOy7JmNYRnBewIK+0NFXfWwRKLUMyjDMd/SxyJBUWoOllFFDOoMgjZdPzUk bguct52zDutR/Q/i9hWttynurfM4lU4sTkNxLqoTrP8bYESfvS15Mxtarv2WI6lwIIBf kOoqPU4eqWUEju/1mxQP/XEPibKvL7fqdJE1wYNaWkB9fBIi1zfM1KfsLR67YaES9hoI sWZEWyJqO4WGDWDGUwd2/n2m+no+1JPfT5JGY3jWg4fUxx9ubavd9CfSt8vkY0uToymI e8Ow3rcdbB2S53AH96yGQdg39J+/DxAbohsqJcceXt2+46O5xbXFS/LCMVqq74zNslT7 ZLXw== X-Gm-Message-State: AKGB3mLy1SpZOV32Y2PxHKtaALMevKFW6hc3KDMoh87xXgBUCAuTQxnD YwRBanVNVWpj8wMYw2+gon4dSJXggf8= X-Received: by 10.159.203.137 with SMTP id ay9mr413452plb.380.1513617509153; Mon, 18 Dec 2017 09:18:29 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.27 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:27 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:50 -0800 Message-Id: <20171218171758.16964-19-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::243 Subject: [Qemu-devel] [PATCH v7 18/26] target/arm: Use vector infrastructure for aa64 multiplies X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- target/arm/translate-a64.c | 171 ++++++++++++++++++++++++++++++++++++--------- 1 file changed, 138 insertions(+), 33 deletions(-) -- 2.14.3 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 1ea7e37b03..c47d9caa49 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -9691,6 +9691,66 @@ static void disas_simd_3same_float(DisasContext *s, uint32_t insn) } } +static void gen_mla8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b) +{ + gen_helper_neon_mul_u8(a, a, b); + gen_helper_neon_add_u8(d, d, a); +} + +static void gen_mla16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b) +{ + gen_helper_neon_mul_u16(a, a, b); + gen_helper_neon_add_u16(d, d, a); +} + +static void gen_mla32_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b) +{ + tcg_gen_mul_i32(a, a, b); + tcg_gen_add_i32(d, d, a); +} + +static void gen_mla64_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + tcg_gen_mul_i64(a, a, b); + tcg_gen_add_i64(d, d, a); +} + +static void gen_mla_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_mul_vec(vece, a, a, b); + tcg_gen_add_vec(vece, d, d, a); +} + +static void gen_mls8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b) +{ + gen_helper_neon_mul_u8(a, a, b); + gen_helper_neon_sub_u8(d, d, a); +} + +static void gen_mls16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b) +{ + gen_helper_neon_mul_u16(a, a, b); + gen_helper_neon_sub_u16(d, d, a); +} + +static void gen_mls32_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b) +{ + tcg_gen_mul_i32(a, a, b); + tcg_gen_sub_i32(d, d, a); +} + +static void gen_mls64_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + tcg_gen_mul_i64(a, a, b); + tcg_gen_sub_i64(d, d, a); +} + +static void gen_mls_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_mul_vec(vece, a, a, b); + tcg_gen_sub_vec(vece, d, d, a); +} + /* Integer op subgroup of C3.6.16. */ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) { @@ -9702,7 +9762,8 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) int rn = extract32(insn, 5, 5); int rd = extract32(insn, 0, 5); int pass; - GVecGen3Fn *gvec_op; + GVecGen3Fn *gvec_fn; + const GVecGen3 *gvec_op; TCGCond cond; switch (opcode) { @@ -9745,12 +9806,70 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) switch (opcode) { case 0x10: /* ADD, SUB */ - gvec_op = u ? tcg_gen_gvec_sub : tcg_gen_gvec_add; - gvec_op(size, vec_full_reg_offset(s, rd), + gvec_fn = u ? tcg_gen_gvec_sub : tcg_gen_gvec_add; + do_gvec: + gvec_fn(size, vec_full_reg_offset(s, rd), vec_full_reg_offset(s, rn), vec_full_reg_offset(s, rm), is_q ? 16 : 8, vec_full_reg_size(s)); return; + case 0x13: /* MUL, PMUL */ + if (!u) { /* MUL */ + gvec_fn = tcg_gen_gvec_mul; + goto do_gvec; + } + break; + case 0x12: /* MLA, MLS */ + { + static const GVecGen3 mla_op[4] = { + { .fni4 = gen_mla8_i32, + .fniv = gen_mla_vec, + .opc = INDEX_op_mul_vec, + .load_dest = true, + .vece = MO_8 }, + { .fni4 = gen_mla16_i32, + .fniv = gen_mla_vec, + .opc = INDEX_op_mul_vec, + .load_dest = true, + .vece = MO_16 }, + { .fni4 = gen_mla32_i32, + .fniv = gen_mla_vec, + .opc = INDEX_op_mul_vec, + .load_dest = true, + .vece = MO_32 }, + { .fni8 = gen_mla64_i64, + .fniv = gen_mla_vec, + .opc = INDEX_op_mul_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true, + .vece = MO_64 }, + }; + static const GVecGen3 mls_op[4] = { + { .fni4 = gen_mls8_i32, + .fniv = gen_mls_vec, + .opc = INDEX_op_mul_vec, + .load_dest = true, + .vece = MO_8 }, + { .fni4 = gen_mls16_i32, + .fniv = gen_mls_vec, + .opc = INDEX_op_mul_vec, + .load_dest = true, + .vece = MO_16 }, + { .fni4 = gen_mls32_i32, + .fniv = gen_mls_vec, + .opc = INDEX_op_mul_vec, + .load_dest = true, + .vece = MO_32 }, + { .fni8 = gen_mls64_i64, + .fniv = gen_mls_vec, + .opc = INDEX_op_mul_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true, + .vece = MO_64 }, + }; + gvec_op = (u ? &mls_op[size] : &mla_op[size]); + } + goto do_gvec_op; case 0x11: if (u) { /* CMEQ */ cond = TCG_COND_EQ; @@ -9771,12 +9890,13 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) .prefer_i64 = TCG_TARGET_REG_BITS == 64, .vece = MO_64 }, }; - tcg_gen_gvec_3(vec_full_reg_offset(s, rd), - vec_full_reg_offset(s, rn), - vec_full_reg_offset(s, rm), - is_q ? 16 : 8, vec_full_reg_size(s), - &cmtst_op[size]); + gvec_op = &cmtst_op[size]; } + do_gvec_op: + tcg_gen_gvec_3(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s), gvec_op); return; case 0x06: /* CMGT, CMHI */ cond = u ? TCG_COND_GTU : TCG_COND_GT; @@ -9944,23 +10064,10 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) break; } case 0x13: /* MUL, PMUL */ - if (u) { - /* PMUL */ - assert(size == 0); - genfn = gen_helper_neon_mul_p8; - break; - } - /* fall through : MUL */ - case 0x12: /* MLA, MLS */ - { - static NeonGenTwoOpFn * const fns[3] = { - gen_helper_neon_mul_u8, - gen_helper_neon_mul_u16, - tcg_gen_mul_i32, - }; - genfn = fns[size]; + assert(u); /* PMUL */ + assert(size == 0); + genfn = gen_helper_neon_mul_p8; break; - } case 0x16: /* SQDMULH, SQRDMULH */ { static NeonGenTwoOpEnvFn * const fns[2][2] = { @@ -9981,18 +10088,16 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) genfn(tcg_res, tcg_op1, tcg_op2); } - if (opcode == 0xf || opcode == 0x12) { - /* SABA, UABA, MLA, MLS: accumulating ops */ - static NeonGenTwoOpFn * const fns[3][2] = { - { gen_helper_neon_add_u8, gen_helper_neon_sub_u8 }, - { gen_helper_neon_add_u16, gen_helper_neon_sub_u16 }, - { tcg_gen_add_i32, tcg_gen_sub_i32 }, + if (opcode == 0xf) { + /* SABA, UABA: accumulating ops */ + static NeonGenTwoOpFn * const fns[3] = { + gen_helper_neon_add_u8, + gen_helper_neon_add_u16, + tcg_gen_add_i32, }; - bool is_sub = (opcode == 0x12 && u); /* MLS */ - genfn = fns[size][is_sub]; read_vec_element_i32(s, tcg_op1, rd, pass, MO_32); - genfn(tcg_res, tcg_op1, tcg_res); + fns[size](tcg_res, tcg_op1, tcg_res); } write_vec_element_i32(s, tcg_res, rd, pass, MO_32); From patchwork Mon Dec 18 17:17:51 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122265 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3120791qgn; Mon, 18 Dec 2017 09:39:18 -0800 (PST) X-Google-Smtp-Source: ACJfBouB7BzfGwvLykiLTNZdMC5s+XvxfmQ8LOo5RBKamgWTQrIstdhFKK50EmRMvyACT+il7ijC X-Received: by 10.37.184.15 with SMTP id v15mr453441ybj.334.1513618758003; Mon, 18 Dec 2017 09:39:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513618757; cv=none; d=google.com; s=arc-20160816; b=R6UsFuO2cZGs2rqgIN/uWEvQhpv9yT3gDGIuicsUYsCxNTLgpH/+FjYNdfSpy1l0d6 xpOhdTHU6ZLj0lDjHoX/RItwMou8ifuKzXhHwGzepISTkLmSNjQgl7+DbgU2FcAe5QUd toNPiDmihg9y3xrtu1ayhGAb9tpgWlbgcXnQo0T+OCMd39WAD66zSMq2yGbxyJOYJ1+M /J2V2cvZUV9s16hX5acV5vY4nf5ign+SflGjoLtysLHYtzGZ5otDJf+O4r0wk8G3KBSf l6uXrMoRnGCHQhdNL+w5j1u8RfUIbmjFB5TnMyyVgsxLmnRPZ+NJp1AgepOsPOfrTXMw 3sVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=im8yvfCpMBlL+ipeYvi7ZEiXxepmauyiRpBAcCtQV8E=; b=Lirg06aYayMv2lpiOvjLb+v7JoBvXKJ6+zkhaO0zYvpemlNpcrLm2xpyClTgu21ecW tG8H+BpkljZUj8RspwwJafBlcAarNqvWe/IzT9Q0Wh6sty+K6526gJs5yDjZ3YBDvY8Y q+BoCnm3G/EV22W5lEZOgiaO5UjPzxzSYcFpTu+qzQqP1hBYMj9Yw0kZ16S5XYiHAhLD xNIAuz2fz1Dda76OIgCC407tFC9RyQMxI7COTnQRaZ5QBwvGiNzmNO7XKlF+r37KfEhX 2l9x61qV1p6jeccjGlfkhjipGl76KFYP1flZAuOvuSfvytWGamd0lFJbXmZqFpyWZh94 05FA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=bjQzCKcB; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id x129si2505972ywf.17.2017.12.18.09.39.17 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:39:17 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=bjQzCKcB; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58799 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzNl-00017F-E9 for patch@linaro.org; Mon, 18 Dec 2017 12:39:17 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37475) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3k-0000dS-C9 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:38 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3g-0001MT-Eg for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:36 -0500 Received: from mail-pl0-x241.google.com ([2607:f8b0:400e:c01::241]:41826) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3g-0001LO-05 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:32 -0500 Received: by mail-pl0-x241.google.com with SMTP id g2so5200769pli.8 for ; Mon, 18 Dec 2017 09:18:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=im8yvfCpMBlL+ipeYvi7ZEiXxepmauyiRpBAcCtQV8E=; b=bjQzCKcB/rs4tnkBT60rU7TtrFIAYmjL4WQ0+8HM9vQroKckwC6jLwVGZmmOZ0yhgi t0VcU2K3eef6dn809iPIL6qB5d6qGOfrQ5OBGWvkoE7gIzqzgDcargiaa8iF0jev6GSU ClspeAweYJ0G/N4fjn1Xw0sulU3TQlqnc/yes= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=im8yvfCpMBlL+ipeYvi7ZEiXxepmauyiRpBAcCtQV8E=; b=OZIxErsfw6QrOVRAx5NPlj1Vt0RShXn5kv1to1Wd5n9dTCCWJ5ofRKsPbiFljD217c G0FSkbjnYVatwVGZuNC0OmcBaApBqPRM2Smb0CImLnrW8KdBt34f/0IS02BV/IqRTi6q ZiispD0m1cjtZ5aypBVgNQQa8uHDudcSXf74iFt9r3oU74sx5WmM4bn9gECGE386TqTv nhzik6+vSDhClr89aAnSuQclqHY/SAwW6N8Oxf1M9r3tSsEQcg+JmJOPBIBwZs/zXynU ZGpxEjKMHVyX4NuiafRpwE+fLAEuDyv/LgZkb+EiwQH+k6PTciHl9q2AWxz2WbjmEAe5 iC7w== X-Gm-Message-State: AKGB3mKvPm5VyG9o3r87ntXg/YmVnAB9G7Owa5yJ2Aa8Y9S4cjs6+Jb5 fuMEGGwQjK5Bu5QAtnJK5FlR/yDUv30= X-Received: by 10.84.241.67 with SMTP id u3mr390838plm.275.1513617510637; Mon, 18 Dec 2017 09:18:30 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.29 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:29 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:51 -0800 Message-Id: <20171218171758.16964-20-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::241 Subject: [Qemu-devel] [PATCH v7 19/26] tcg: Add generic vector ops for extension X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- accel/tcg/tcg-runtime.h | 8 +++ tcg/i386/tcg-target.h | 2 + tcg/tcg-op-gvec.h | 9 +++ tcg/tcg-op.h | 5 ++ tcg/tcg-opc.h | 5 ++ tcg/tcg.h | 2 + accel/tcg/tcg-runtime-gvec.c | 26 ++++++++ tcg/tcg-op-gvec.c | 138 +++++++++++++++++++++++++++++++++++++++++++ tcg/tcg-op-vec.c | 39 ++++++++++++ tcg/tcg.c | 6 ++ tcg/README | 13 ++++ 11 files changed, 253 insertions(+) -- 2.14.3 diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h index c4a2e6b215..d1b3542946 100644 --- a/accel/tcg/tcg-runtime.h +++ b/accel/tcg/tcg-runtime.h @@ -199,6 +199,14 @@ DEF_HELPER_FLAGS_4(gvec_trn16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_trn32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_trn64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_extu8, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_extu16, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_extu32, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(gvec_exts8, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_exts16, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_exts32, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(gvec_eq8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_eq16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_eq32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h index 949d138c9d..fedc3449c1 100644 --- a/tcg/i386/tcg-target.h +++ b/tcg/i386/tcg-target.h @@ -185,6 +185,8 @@ extern bool have_avx2; #define TCG_TARGET_HAS_trn_vec 0 #define TCG_TARGET_HAS_cmp_vec 1 #define TCG_TARGET_HAS_mul_vec 0 +#define TCG_TARGET_HAS_extl_vec 0 +#define TCG_TARGET_HAS_exth_vec 0 #define TCG_TARGET_deposit_i32_valid(ofs, len) \ (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \ diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h index ad5e22e1bf..188c3368bd 100644 --- a/tcg/tcg-op-gvec.h +++ b/tcg/tcg-op-gvec.h @@ -180,6 +180,15 @@ void tcg_gen_gvec_trne(unsigned vece, uint32_t dofs, uint32_t aofs, void tcg_gen_gvec_trno(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_extul(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_extuh(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_extsl(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_extsh(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t clsz); + void tcg_gen_gvec_cmp(TCGCond cond, unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t opsz, uint32_t clsz); diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index 3296a7baa5..a722c400c2 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -940,6 +940,11 @@ void tcg_gen_uzpo_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_trne_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_trno_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_extul_vec(unsigned vece, TCGv_vec r, TCGv_vec a); +void tcg_gen_extuh_vec(unsigned vece, TCGv_vec r, TCGv_vec a); +void tcg_gen_extsl_vec(unsigned vece, TCGv_vec r, TCGv_vec a); +void tcg_gen_extsh_vec(unsigned vece, TCGv_vec r, TCGv_vec a); + void tcg_gen_cmp_vec(TCGCond cond, unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index b21a30273c..3dfd872a0f 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -249,6 +249,11 @@ DEF(uzpo_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_uzp_vec)) DEF(trne_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_trn_vec)) DEF(trno_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_trn_vec)) +DEF(extul_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_extl_vec)) +DEF(extuh_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_exth_vec)) +DEF(extsl_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_extl_vec)) +DEF(extsh_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_exth_vec)) + DEF(cmp_vec, 1, 2, 1, IMPLVEC) DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT) diff --git a/tcg/tcg.h b/tcg/tcg.h index 64078a2e92..737550385d 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -186,6 +186,8 @@ typedef uint64_t TCGRegSet; #define TCG_TARGET_HAS_trn_vec 0 #define TCG_TARGET_HAS_cmp_vec 0 #define TCG_TARGET_HAS_mul_vec 0 +#define TCG_TARGET_HAS_extl_vec 0 +#define TCG_TARGET_HAS_exth_vec 0 #else #define TCG_TARGET_MAYBE_vec 1 #endif diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c index 9406ccd769..ff26be0744 100644 --- a/accel/tcg/tcg-runtime-gvec.c +++ b/accel/tcg/tcg-runtime-gvec.c @@ -588,3 +588,29 @@ DO_CMP2(8) DO_CMP2(16) DO_CMP2(32) DO_CMP2(64) + +#define DO_EXT(NAME, TYPE1, TYPE2) \ +void HELPER(NAME)(void *d, void *a, uint32_t desc) \ +{ \ + intptr_t oprsz = simd_oprsz(desc); \ + intptr_t oprsz_2 = oprsz / 2; \ + intptr_t i; \ + /* We produce output faster than we consume input. \ + Therefore we must be mindful of possible overlap. */ \ + if (unlikely((a - d) < (uintptr_t)oprsz)) { \ + void *a_new = alloca(oprsz_2); \ + memcpy(a_new, a, oprsz_2); \ + a = a_new; \ + } \ + for (i = 0; i < oprsz_2; i += sizeof(TYPE1)) { \ + *(TYPE2 *)(d + 2 * i) = *(TYPE1 *)(a + i); \ + } \ + clear_high(d, oprsz, desc); \ +} + +DO_EXT(gvec_extu8, uint8_t, uint16_t) +DO_EXT(gvec_extu16, uint16_t, uint32_t) +DO_EXT(gvec_extu32, uint32_t, uint64_t) +DO_EXT(gvec_exts8, int8_t, int16_t) +DO_EXT(gvec_exts16, int16_t, int32_t) +DO_EXT(gvec_exts32, int32_t, int64_t) diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index e6d885da66..8ec1817727 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -1952,3 +1952,141 @@ void tcg_gen_gvec_cmp(TCGCond cond, unsigned vece, uint32_t dofs, } tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, 0, fn); } + +static void do_ext(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, bool high, bool is_sign) +{ + static gen_helper_gvec_2 * const extu_fn[3] = { + gen_helper_gvec_extu8, gen_helper_gvec_extu16, gen_helper_gvec_extu32 + }; + static gen_helper_gvec_2 * const exts_fn[3] = { + gen_helper_gvec_exts8, gen_helper_gvec_exts16, gen_helper_gvec_exts32 + }; + + TCGType type; + uint32_t step, i, n; + TCGOpcode opc; + + check_size_align(oprsz, maxsz, dofs | aofs); + check_overlap_2(dofs, aofs, oprsz); + tcg_debug_assert(vece < MO_64); + + /* Quick check for sizes we won't support inline. */ + if (oprsz > 4 * 32 || maxsz > MAX_UNROLL * 32) { + goto do_ool; + } + + opc = is_sign ? (high ? INDEX_op_extsh_vec : INDEX_op_extsl_vec) + : (high ? INDEX_op_extuh_vec : INDEX_op_extul_vec); + + /* Since these operations don't operate in lock-step lanes, + we must care for overlap. */ + if (TCG_TARGET_HAS_v256 && oprsz % 32 == 0 && oprsz / 32 <= 8 + && tcg_can_emit_vec_op(opc, TCG_TYPE_V256, vece)) { + type = TCG_TYPE_V256; + step = 32; + n = oprsz / 32; + } else if (TCG_TARGET_HAS_v128 && oprsz % 16 == 0 && oprsz / 16 <= 8 + && tcg_can_emit_vec_op(opc, TCG_TYPE_V128, vece)) { + type = TCG_TYPE_V128; + step = 16; + n = oprsz / 16; + } else if (TCG_TARGET_HAS_v64 && oprsz % 8 == 0 && oprsz / 8 <= 8 + && tcg_can_emit_vec_op(opc, TCG_TYPE_V64, vece)) { + type = TCG_TYPE_V64; + step = 8; + n = oprsz / 8; + } else { + goto do_ool; + } + + if (n == 1) { + TCGv_vec t1 = tcg_temp_new_vec(type); + + tcg_gen_ld_vec(t1, cpu_env, aofs); + if (high) { + if (is_sign) { + tcg_gen_extsh_vec(vece, t1, t1); + } else { + tcg_gen_extuh_vec(vece, t1, t1); + } + } else { + if (is_sign) { + tcg_gen_extsl_vec(vece, t1, t1); + } else { + tcg_gen_extul_vec(vece, t1, t1); + } + } + tcg_gen_st_vec(t1, cpu_env, dofs); + tcg_temp_free_vec(t1); + } else { + TCGv_vec ta[4], tmp; + + if (high) { + aofs += oprsz / 2; + } + + for (i = 0; i < (n / 2 + n % 2); ++i) { + ta[i] = tcg_temp_new_vec(type); + tcg_gen_ld_vec(ta[i], cpu_env, aofs + i * step); + } + + tmp = tcg_temp_new_vec(type); + for (i = 0; i < n; ++i) { + if (i & 1) { + if (is_sign) { + tcg_gen_extsh_vec(vece, tmp, ta[i / 2]); + } else { + tcg_gen_extuh_vec(vece, tmp, ta[i / 2]); + } + } else { + if (is_sign) { + tcg_gen_extsl_vec(vece, tmp, ta[i / 2]); + } else { + tcg_gen_extul_vec(vece, tmp, ta[i / 2]); + } + } + tcg_gen_st_vec(tmp, cpu_env, dofs + i * step); + } + tcg_temp_free_vec(tmp); + + for (i = 0; i < (n / 2 + n % 2); ++i) { + tcg_temp_free_vec(ta[i]); + } + } + if (oprsz < maxsz) { + expand_clr(dofs + oprsz, maxsz - oprsz); + } + return; + + do_ool: + if (high) { + aofs += oprsz / 2; + } + tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, 0, + is_sign ? exts_fn[vece] : extu_fn[vece]); +} + +void tcg_gen_gvec_extul(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz) +{ + do_ext(vece, dofs, aofs, oprsz, maxsz, false, false); +} + +void tcg_gen_gvec_extuh(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz) +{ + do_ext(vece, dofs, aofs, oprsz, maxsz, true, false); +} + +void tcg_gen_gvec_extsl(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz) +{ + do_ext(vece, dofs, aofs, oprsz, maxsz, false, true); +} + +void tcg_gen_gvec_extsh(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz) +{ + do_ext(vece, dofs, aofs, oprsz, maxsz, true, true); +} diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c index 667e5095cb..3e7c525796 100644 --- a/tcg/tcg-op-vec.c +++ b/tcg/tcg-op-vec.c @@ -525,3 +525,42 @@ void tcg_gen_mul_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) tcg_expand_vec_op(INDEX_op_mul_vec, type, vece, ri, ai, bi); } } + +static void do_ext(TCGOpcode opc, unsigned vece, TCGv_vec r, TCGv_vec a) +{ + TCGTemp *rt = tcgv_vec_temp(r); + TCGTemp *at = tcgv_vec_temp(a); + TCGArg ri = temp_arg(rt); + TCGArg ai = temp_arg(at); + TCGType type = rt->base_type; + int can; + + tcg_debug_assert(at->base_type == type); + can = tcg_can_emit_vec_op(opc, type, vece); + if (can > 0) { + vec_gen_2(opc, type, vece, ri, ai); + } else { + tcg_debug_assert(can < 0); + tcg_expand_vec_op(opc, type, vece, ri, ai); + } +} + +void tcg_gen_extul_vec(unsigned vece, TCGv_vec r, TCGv_vec a) +{ + do_ext(INDEX_op_extul_vec, vece, r, a); +} + +void tcg_gen_extuh_vec(unsigned vece, TCGv_vec r, TCGv_vec a) +{ + do_ext(INDEX_op_extuh_vec, vece, r, a); +} + +void tcg_gen_extsl_vec(unsigned vece, TCGv_vec r, TCGv_vec a) +{ + do_ext(INDEX_op_extsl_vec, vece, r, a); +} + +void tcg_gen_extsh_vec(unsigned vece, TCGv_vec r, TCGv_vec a) +{ + do_ext(INDEX_op_extsh_vec, vece, r, a); +} diff --git a/tcg/tcg.c b/tcg/tcg.c index 5608391dca..8c0ee0a9db 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1427,6 +1427,12 @@ bool tcg_op_supported(TCGOpcode op) case INDEX_op_trne_vec: case INDEX_op_trno_vec: return have_vec && TCG_TARGET_HAS_trn_vec; + case INDEX_op_extul_vec: + case INDEX_op_extsl_vec: + return have_vec && TCG_TARGET_HAS_extl_vec; + case INDEX_op_extuh_vec: + case INDEX_op_extsh_vec: + return have_vec && TCG_TARGET_HAS_exth_vec; default: tcg_debug_assert(op > INDEX_op_last_generic && op < NB_OPS); diff --git a/tcg/README b/tcg/README index 17695ff7f6..56c70764bc 100644 --- a/tcg/README +++ b/tcg/README @@ -634,6 +634,19 @@ E.g. VECL=1 -> 64 << 1 -> v128, and VECE=2 -> 1 << 2 -> i32. v0[2i + 1] = v2[2i + part]; } +* extul_vec v0, v1 + + Extend unsigned the low VECL/VECE/2 elements of v1 into v0. + +* extuh_vec v0, v1 + + Similarly for the high VECL/VECE/2 elements. + +* extsl_vec v0, v1 +* extsh_vec v0, v1 + + Similarly with signed extension. + * cmp_vec v0, v1, v2, cond Compare vectors by element, storing -1 for true and 0 for false. From patchwork Mon Dec 18 17:17:52 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122260 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3116859qgn; Mon, 18 Dec 2017 09:35:22 -0800 (PST) X-Google-Smtp-Source: ACJfBouxaM5hNmTN+0aiERSx98S/zEta9P68GVocV9jNHbHI/H6UHQ1NQebQffiAbXSKH5vojAel X-Received: by 10.37.29.131 with SMTP id d125mr448030ybd.440.1513618522109; Mon, 18 Dec 2017 09:35:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513618522; cv=none; d=google.com; s=arc-20160816; b=vGol3g+bnqr6Irp1RrLOJaIjMbewcw0bIm6cbiCj39oNmO7h4cQm0cP7F4b3AYiR+N +w9gqAPwxUNlLs2dy0wVpAUpD2aw8illOdb/0WKqR8tehk/1Cf0rH4IeSsJF4mFaKEbk vbu+7dwaWXqN158CRynsyVIxb+HMPtaks+DKUnyKa9EkPZIvNeONUSoX9CLWGWnTP+mi ONq4N/McY1Vg9HJDxVrdCkzIGNM4JAsn3pPPHJD4LmvpoV9SF2L0ulf9kEFVOG14zWzF VbnTACiacuqXP9V1AydmFX9a3Z5xBO4c70a3vZ9r/UtcuJhpMo5FH2/I0q7d7XrTiUsL yYjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=UGxD5sTymPbXgAXlALlEt0lMlOhbok+QGJLeIBCttPY=; b=AOPEMlfAiNeofYDgQtii5Gh98IHEAOUuXxwU5vXFUXfcZ/RvAiz4Gzgr5U9PPdqWJH 0/pDaRQOeuLkT9bowGnwJVkx6rW+ruNHwiy4L2jbJMrgocyuw8xf53xOCtpwxd3GiLA2 KUPcYY4W+Cwwaiss3VAyXEgV8STf82JJRp+XwglyAtwaALUDQqDuTthf9X5rQXig9CBu 1fdRxb/8SJdO4Bh01gQSgEmXjr5BgYh+j5tGnYcqvCM7STNkKjw3g+9cYNFBfjP8WEgn 7iAYQ5zLs9WyIQtWj6h5VuhH0qT2KTJ5a7icpQ0yVFgfYs33aDVy1lohaGnAwtAYIo4u eigw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=G9AKkKu8; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id g8si2506665ybk.226.2017.12.18.09.35.21 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:35:22 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=G9AKkKu8; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58643 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzJx-0006Bq-Ij for patch@linaro.org; Mon, 18 Dec 2017 12:35:21 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37476) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3k-0000dV-C6 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:37 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3h-0001Nj-NS for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:36 -0500 Received: from mail-pl0-x241.google.com ([2607:f8b0:400e:c01::241]:33600) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3h-0001N0-GJ for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:33 -0500 Received: by mail-pl0-x241.google.com with SMTP id 1so3743222plv.0 for ; Mon, 18 Dec 2017 09:18:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=UGxD5sTymPbXgAXlALlEt0lMlOhbok+QGJLeIBCttPY=; b=G9AKkKu81F8LfAqx0/vuTf8o1yIWiEqTsEilz9LN/p/zeAkNSO+oei2YXPLe/Psfht mQ25dvbxEsejmXq9/UWK06VmoTSuAlgIzj5wJC+ewyfqwC9S5fbQ4TaXAOm8+JL3xbL/ n1WTB2OJnfrd0iVSsj45Hc2MQfQZUK5dJf8Dw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=UGxD5sTymPbXgAXlALlEt0lMlOhbok+QGJLeIBCttPY=; b=qGMNun2McR1cQKtNwBTYZKOEFbVJ6o93HmGjepe1ozp0DcxZUgBPV/t+OXGxEwNbFG KhlrYaAWaMqATvuabdD/9SeIKWAaICHe2iHLYdOE4edmXHgyMALDOX8jpl3FmT5AZ/jt 9egRmOYTQNmJsfZtC0b6yRlIMFC8AwKM31AtyMHUQOxxSX6BFm1z9ET6LQcqYH+swDqq PyECQMBCTyRzbZvYX4D+JkoerQDR8c2Eew4yUkJmkZXWQk+9WacWUBh3xQgOMUeW5gm9 xr4YZEBIYA8mI7UoDwBF9ihfETxvsS5A1lpqXtPkuRq5QEGzTLYN+E9cZ4tXC/z3hRox t8pg== X-Gm-Message-State: AKGB3mKhHWaAz5Su7qLacLJXnXprIkiuEVcnI1DvcM8R3yf++ygqBaBt WXEoZ3YU7Q28g7ZpFJU9/NLFws+/O00= X-Received: by 10.84.194.163 with SMTP id h32mr367260pld.335.1513617512218; Mon, 18 Dec 2017 09:18:32 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.30 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:31 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:52 -0800 Message-Id: <20171218171758.16964-21-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::241 Subject: [Qemu-devel] [PATCH v7 20/26] target/arm: Use vector infrastructure for aa64 widening shifts X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- target/arm/translate-a64.c | 29 ++++++++++++----------------- 1 file changed, 12 insertions(+), 17 deletions(-) -- 2.14.3 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index c47d9caa49..1f7e9c4e19 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -8705,12 +8705,7 @@ static void handle_vec_simd_wshli(DisasContext *s, bool is_q, bool is_u, int size = 32 - clz32(immh) - 1; int immhb = immh << 3 | immb; int shift = immhb - (8 << size); - int dsize = 64; - int esize = 8 << size; - int elements = dsize/esize; - TCGv_i64 tcg_rn = new_tmp_a64(s); - TCGv_i64 tcg_rd = new_tmp_a64(s); - int i; + GVecGen2Fn *gvec_fn; if (size >= 3) { unallocated_encoding(s); @@ -8721,18 +8716,18 @@ static void handle_vec_simd_wshli(DisasContext *s, bool is_q, bool is_u, return; } - /* For the LL variants the store is larger than the load, - * so if rd == rn we would overwrite parts of our input. - * So load everything right now and use shifts in the main loop. - */ - read_vec_element(s, tcg_rn, rn, is_q ? 1 : 0, MO_64); - - for (i = 0; i < elements; i++) { - tcg_gen_shri_i64(tcg_rd, tcg_rn, i * esize); - ext_and_shift_reg(tcg_rd, tcg_rd, size | (!is_u << 2), 0); - tcg_gen_shli_i64(tcg_rd, tcg_rd, shift); - write_vec_element(s, tcg_rd, rd, i, size + 1); + if (is_u) { + gvec_fn = is_q ? tcg_gen_gvec_extuh : tcg_gen_gvec_extul; + } else { + gvec_fn = is_q ? tcg_gen_gvec_extsh : tcg_gen_gvec_extsl; } + gvec_fn(size, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), 16, 16); + + /* Perform the shift in the wider format. */ + tcg_gen_gvec_shli(size + 1, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rd), + 16, vec_full_reg_size(s), shift); } /* SHRN/RSHRN - Shift right with narrowing (and potential rounding) */ From patchwork Mon Dec 18 17:17:53 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122264 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3120327qgn; Mon, 18 Dec 2017 09:38:50 -0800 (PST) X-Google-Smtp-Source: ACJfBovi0fJlWERbrwp5m6gxR1OScoC8CGeePl6og1eLl78MmNfY/9hsFJqWRNXifT1yUK0eViUz X-Received: by 10.37.164.7 with SMTP id f7mr435336ybi.413.1513618730674; Mon, 18 Dec 2017 09:38:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513618730; cv=none; d=google.com; s=arc-20160816; b=JibFfCqoEBJs2EmcbOq9JusYTbJhSUml8xou/wBDdFbhc1yZNz+aHfr0QKU7RxvRKh tdm3NAJ2HNkbp5fLzASe1uIj4T71jvYc0IDZuav3UtppEeaxleqzC1IqUGRq3wcqqncr IxQtCxslXhmZNGDAyMnUAMClkDtnIsJMp2G+SkKfsjEQsgQY7qtISFt9nmbL5WJPvUia +AgB7+m4YmjcwF7JaVwvXHaXQeureXzCYWwdYKH1GlpcG7rsav21BonsopRMOBAXnxxn 6EyCvi7WxO28HVnnj9cdWmzbix2DHstjX6HC1WoZs8gpEVYkgpZ6C+iRVEVlx6/AHpEP zIRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=lcln0V3poKkwsQWNWEdriZPW+BPr4TRPaGktDD1IgpA=; b=LjmajmHGU4+e1RHMJ1icdKUaa6dP292kOwW18Lu5Cl16wDl2Dg/UrmV7VqV+GIQQiE D5/LmzzeUhUXBksI5GAhKSG7wvSNH4DJkUEvSTu5qcLiHEg9VVn92GR6vCZNpZVf9WFI wz3j2JlGxBkoD7DVPZTc6Unpxs6VGlC2B33yVkKr9NG19kemTNtz+xHMzQZANjXlEBLx WrIFtVxLislFZO3xMN7T8QmNjo8v1UoclYKuzAFTKC90k/Ppkl3QybfOrG3uZP/GlHAS +vwCbQnqVJ48HdVFd6XSs+0/GbBKnv46sYpgxG2qRBAo1YhX1QxGaxPyqa6c7j8qLhGd OlYg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=I6b2V1py; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id p63si2530716ywd.155.2017.12.18.09.38.50 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:38:50 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=I6b2V1py; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58782 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzNK-0000l7-5F for patch@linaro.org; Mon, 18 Dec 2017 12:38:50 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37496) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3l-0000eO-6T for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:38 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3j-0001Pl-BX for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:37 -0500 Received: from mail-pg0-x243.google.com ([2607:f8b0:400e:c05::243]:36824) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3j-0001On-1y for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:35 -0500 Received: by mail-pg0-x243.google.com with SMTP id k134so9403046pga.3 for ; Mon, 18 Dec 2017 09:18:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=lcln0V3poKkwsQWNWEdriZPW+BPr4TRPaGktDD1IgpA=; b=I6b2V1pyBRlsNUaWtlaq/jJ+BNQ7XSvWWcneoLm3JalYSc16nIXaX5qg7qk4m/QHdu 3pwZ0INoMEUFB9nM7XsbMEXV/6ty/r8NVLaAj/4FOU48y+yZGyFyMyMzeQjph0NzIHcs jTWbngpgxD4IksJAmmMC8WSWsYOccJwA1JmDM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=lcln0V3poKkwsQWNWEdriZPW+BPr4TRPaGktDD1IgpA=; b=jh84cWQNVHKWTY0E4tkqdSwBO14VfDil2woA/kikk7gnbFfZPMpHcxKHDxZEgWvr3Y ReYoxE59lf8YloD3F8Tn/0KBrjZAcR/1B/20wITlWVXNHu9jPRmIg7YMy/JGaV0E3NW9 +Sd8YT3ETOwQgRTvp3Zn7r5WEY4ygdwBXpKauWmGGYlM50ijmG3HgXW6eV3gl1mwTTsw 3Ow0GSqgD0Ih+XVPGSzWavIcPob+QqU1Rf4KGxLGdlxHXXtFHRQfeA+NQLKMH+mSBOUP SAEqZNRU8Rz79zOcDSEko3bFLDXXzlttW8VUIF6Sz5hXmpLjFabdhpynqtbai8MQ6ekn 3y2A== X-Gm-Message-State: AKGB3mL55xwZYKjdnzhXqdXuyC85MP8TGoV3uT0eDdduQmvxeeSzqgAT LGPY18g0lSMw/ZmhiIlomrJxYQlvyr0= X-Received: by 10.98.201.29 with SMTP id k29mr408764pfg.27.1513617513766; Mon, 18 Dec 2017 09:18:33 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.32 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:32 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:53 -0800 Message-Id: <20171218171758.16964-22-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::243 Subject: [Qemu-devel] [PATCH v7 21/26] tcg/i386: Add vector operations/expansions for mul/extend X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- tcg/i386/tcg-target.h | 4 +- tcg/i386/tcg-target.opc.h | 1 + tcg/i386/tcg-target.inc.c | 186 ++++++++++++++++++++++++++++++++++++++++++---- 3 files changed, 174 insertions(+), 17 deletions(-) -- 2.14.3 diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h index fedc3449c1..e77b95cc2c 100644 --- a/tcg/i386/tcg-target.h +++ b/tcg/i386/tcg-target.h @@ -184,8 +184,8 @@ extern bool have_avx2; #define TCG_TARGET_HAS_uzp_vec 0 #define TCG_TARGET_HAS_trn_vec 0 #define TCG_TARGET_HAS_cmp_vec 1 -#define TCG_TARGET_HAS_mul_vec 0 -#define TCG_TARGET_HAS_extl_vec 0 +#define TCG_TARGET_HAS_mul_vec 1 +#define TCG_TARGET_HAS_extl_vec 1 #define TCG_TARGET_HAS_exth_vec 0 #define TCG_TARGET_deposit_i32_valid(ofs, len) \ diff --git a/tcg/i386/tcg-target.opc.h b/tcg/i386/tcg-target.opc.h index 77125ef818..5f05df65e0 100644 --- a/tcg/i386/tcg-target.opc.h +++ b/tcg/i386/tcg-target.opc.h @@ -8,3 +8,4 @@ DEF(x86_blend_vec, 1, 2, 1, IMPLVEC) DEF(x86_packss_vec, 1, 2, 0, IMPLVEC) DEF(x86_packus_vec, 1, 2, 0, IMPLVEC) DEF(x86_psrldq_vec, 1, 1, 1, IMPLVEC) +DEF(x86_vperm2i128_vec, 1, 2, 1, IMPLVEC) diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c index 694d9e5cb5..e61aeebf3e 100644 --- a/tcg/i386/tcg-target.inc.c +++ b/tcg/i386/tcg-target.inc.c @@ -393,6 +393,14 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type, #define OPC_PCMPGTW (0x65 | P_EXT | P_DATA16) #define OPC_PCMPGTD (0x66 | P_EXT | P_DATA16) #define OPC_PCMPGTQ (0x37 | P_EXT38 | P_DATA16) +#define OPC_PMOVSXBW (0x20 | P_EXT38 | P_DATA16) +#define OPC_PMOVSXWD (0x23 | P_EXT38 | P_DATA16) +#define OPC_PMOVSXDQ (0x25 | P_EXT38 | P_DATA16) +#define OPC_PMOVZXBW (0x30 | P_EXT38 | P_DATA16) +#define OPC_PMOVZXWD (0x33 | P_EXT38 | P_DATA16) +#define OPC_PMOVZXDQ (0x35 | P_EXT38 | P_DATA16) +#define OPC_PMULLW (0xd5 | P_EXT | P_DATA16) +#define OPC_PMULLD (0x40 | P_EXT38 | P_DATA16) #define OPC_POR (0xeb | P_EXT | P_DATA16) #define OPC_PSHUFB (0x00 | P_EXT38 | P_DATA16) #define OPC_PSHUFD (0x70 | P_EXT | P_DATA16) @@ -2675,6 +2683,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static int const sub_insn[4] = { OPC_PSUBB, OPC_PSUBW, OPC_PSUBD, OPC_PSUBQ }; + static int const mul_insn[4] = { + OPC_UD2, OPC_PMULLW, OPC_PMULLD, OPC_UD2 + }; static int const shift_imm_insn[4] = { OPC_UD2, OPC_PSHIFTW_Ib, OPC_PSHIFTD_Ib, OPC_PSHIFTQ_Ib }; @@ -2690,6 +2701,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, static int const punpckh_insn[4] = { OPC_PUNPCKHBW, OPC_PUNPCKHWD, OPC_PUNPCKHDQ, OPC_PUNPCKHQDQ }; + static int const packss_insn[4] = { + OPC_PACKSSWB, OPC_PACKSSDW, OPC_UD2, OPC_UD2 + }; + static int const packus_insn[4] = { + OPC_PACKUSWB, OPC_PACKUSDW, OPC_UD2, OPC_UD2 + }; + static int const pmovsx_insn[3] = { + OPC_PMOVSXBW, OPC_PMOVSXWD, OPC_PMOVSXDQ + }; + static int const pmovzx_insn[3] = { + OPC_PMOVZXBW, OPC_PMOVZXWD, OPC_PMOVZXDQ + }; TCGType type = vecl + TCG_TYPE_V64; int insn, sub; @@ -2706,6 +2729,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, case INDEX_op_sub_vec: insn = sub_insn[vece]; goto gen_simd; + case INDEX_op_mul_vec: + insn = mul_insn[vece]; + goto gen_simd; case INDEX_op_and_vec: insn = OPC_PAND; goto gen_simd; @@ -2722,30 +2748,33 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, insn = punpckh_insn[vece]; goto gen_simd; case INDEX_op_x86_packss_vec: - if (vece == MO_8) { - insn = OPC_PACKSSWB; - } else if (vece == MO_16) { - insn = OPC_PACKSSDW; - } else { - g_assert_not_reached(); - } + insn = packss_insn[vece]; goto gen_simd; case INDEX_op_x86_packus_vec: - if (vece == MO_8) { - insn = OPC_PACKUSWB; - } else if (vece == MO_16) { - insn = OPC_PACKUSDW; - } else { - g_assert_not_reached(); - } + insn = packus_insn[vece]; goto gen_simd; gen_simd: + tcg_debug_assert(insn != OPC_UD2); if (type == TCG_TYPE_V256) { insn |= P_VEXL; } tcg_out_vex_modrm(s, insn, a0, a1, a2); break; + case INDEX_op_extsl_vec: + insn = pmovsx_insn[vece]; + goto gen_simd2; + case INDEX_op_extul_vec: + insn = pmovzx_insn[vece]; + goto gen_simd2; + gen_simd2: + tcg_debug_assert(vece < MO_64); + if (type == TCG_TYPE_V256) { + insn |= P_VEXL; + } + tcg_out_vex_modrm(s, insn, a0, 0, a1); + break; + case INDEX_op_cmp_vec: sub = args[3]; if (sub == TCG_COND_EQ) { @@ -2811,6 +2840,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, } sub = args[3]; goto gen_simd_imm8; + case INDEX_op_x86_vperm2i128_vec: + insn = OPC_VPERM2I128; + sub = args[3]; + goto gen_simd_imm8; gen_simd_imm8: if (type == TCG_TYPE_V256) { insn |= P_VEXL; @@ -3073,6 +3106,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) case INDEX_op_add_vec: case INDEX_op_sub_vec: + case INDEX_op_mul_vec: case INDEX_op_and_vec: case INDEX_op_or_vec: case INDEX_op_xor_vec: @@ -3084,11 +3118,14 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) case INDEX_op_x86_blend_vec: case INDEX_op_x86_packss_vec: case INDEX_op_x86_packus_vec: + case INDEX_op_x86_vperm2i128_vec: return &x_x_x; case INDEX_op_dup_vec: case INDEX_op_shli_vec: case INDEX_op_shri_vec: case INDEX_op_sari_vec: + case INDEX_op_extsl_vec: + case INDEX_op_extul_vec: case INDEX_op_x86_psrldq_vec: return &x_x; case INDEX_op_x86_vpblendvb_vec: @@ -3109,8 +3146,12 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_or_vec: case INDEX_op_xor_vec: case INDEX_op_andc_vec: + case INDEX_op_extsl_vec: + case INDEX_op_extul_vec: return 1; case INDEX_op_cmp_vec: + case INDEX_op_extsh_vec: + case INDEX_op_extuh_vec: return -1; case INDEX_op_shli_vec: @@ -3130,6 +3171,16 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) } return 1; + case INDEX_op_mul_vec: + if (vece == MO_8) { + /* We can expand the operation for MO_8. */ + return -1; + } + if (vece == MO_64) { + return 0; + } + return 1; + case INDEX_op_zipl_vec: /* We could support v256, but with 3 insns per opcode. It is better to expand with v128 instead. */ @@ -3157,7 +3208,7 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece, { va_list va; TCGArg a1, a2; - TCGv_vec v0, v1, v2, t1, t2; + TCGv_vec v0, v1, v2, t1, t2, t3, t4; va_start(va, a0); v0 = temp_tcgv_vec(arg_temp(a0)); @@ -3248,6 +3299,91 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece, tcg_temp_free_vec(t1); break; + case INDEX_op_mul_vec: + tcg_debug_assert(vece == MO_8); + a1 = va_arg(va, TCGArg); + a2 = va_arg(va, TCGArg); + switch (type) { + case TCG_TYPE_V64: + t1 = tcg_temp_new_vec(TCG_TYPE_V128); + t2 = tcg_temp_new_vec(TCG_TYPE_V128); + tcg_gen_dup16i_vec(t2, 0); + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V128, MO_8, + tcgv_vec_arg(t1), a1, tcgv_vec_arg(t2)); + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V128, MO_8, + tcgv_vec_arg(t2), tcgv_vec_arg(t2), a2); + tcg_gen_mul_vec(MO_16, t1, t1, t2); + tcg_gen_shri_vec(MO_16, t1, t1, 8); + vec_gen_3(INDEX_op_x86_packus_vec, TCG_TYPE_V128, MO_8, + a0, tcgv_vec_arg(t1), tcgv_vec_arg(t1)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + break; + + case TCG_TYPE_V128: + t1 = tcg_temp_new_vec(TCG_TYPE_V128); + t2 = tcg_temp_new_vec(TCG_TYPE_V128); + t3 = tcg_temp_new_vec(TCG_TYPE_V128); + t4 = tcg_temp_new_vec(TCG_TYPE_V128); + tcg_gen_dup16i_vec(t4, 0); + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V128, MO_8, + tcgv_vec_arg(t1), a1, tcgv_vec_arg(t4)); + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V128, MO_8, + tcgv_vec_arg(t2), tcgv_vec_arg(t4), a2); + vec_gen_3(INDEX_op_ziph_vec, TCG_TYPE_V128, MO_8, + tcgv_vec_arg(t3), a1, tcgv_vec_arg(t4)); + vec_gen_3(INDEX_op_ziph_vec, TCG_TYPE_V128, MO_8, + tcgv_vec_arg(t4), tcgv_vec_arg(t4), a2); + tcg_gen_mul_vec(MO_16, t1, t1, t2); + tcg_gen_mul_vec(MO_16, t3, t3, t4); + tcg_gen_shri_vec(MO_16, t1, t1, 8); + tcg_gen_shri_vec(MO_16, t3, t3, 8); + vec_gen_3(INDEX_op_x86_packus_vec, TCG_TYPE_V128, MO_8, + a0, tcgv_vec_arg(t1), tcgv_vec_arg(t3)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + tcg_temp_free_vec(t3); + tcg_temp_free_vec(t4); + break; + + case TCG_TYPE_V256: + t1 = tcg_temp_new_vec(TCG_TYPE_V256); + t2 = tcg_temp_new_vec(TCG_TYPE_V256); + t3 = tcg_temp_new_vec(TCG_TYPE_V256); + t4 = tcg_temp_new_vec(TCG_TYPE_V256); + tcg_gen_dup16i_vec(t4, 0); + /* a1: A[0-7] ... D[0-7]; a2: W[0-7] ... Z[0-7] + t1: extends of B[0-7], D[0-7] + t2: extends of X[0-7], Z[0-7] + t3: extends of A[0-7], C[0-7] + t4: extends of W[0-7], Y[0-7]. */ + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V256, MO_8, + tcgv_vec_arg(t1), a1, tcgv_vec_arg(t4)); + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V256, MO_8, + tcgv_vec_arg(t2), tcgv_vec_arg(t4), a2); + vec_gen_3(INDEX_op_ziph_vec, TCG_TYPE_V256, MO_8, + tcgv_vec_arg(t3), a1, tcgv_vec_arg(t4)); + vec_gen_3(INDEX_op_ziph_vec, TCG_TYPE_V256, MO_8, + tcgv_vec_arg(t4), tcgv_vec_arg(t4), a2); + /* t1: BX DZ; t2: AW CY. */ + tcg_gen_mul_vec(MO_16, t1, t1, t2); + tcg_gen_mul_vec(MO_16, t3, t3, t4); + tcg_gen_shri_vec(MO_16, t1, t1, 8); + tcg_gen_shri_vec(MO_16, t3, t3, 8); + /* a0: AW BX CY DZ. */ + vec_gen_3(INDEX_op_x86_packus_vec, TCG_TYPE_V256, MO_8, + a0, tcgv_vec_arg(t1), tcgv_vec_arg(t3)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + tcg_temp_free_vec(t3); + tcg_temp_free_vec(t4); + break; + + default: + g_assert_not_reached(); + } + break; + case INDEX_op_ziph_vec: tcg_debug_assert(type == TCG_TYPE_V64); a1 = va_arg(va, TCGArg); @@ -3256,6 +3392,26 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece, vec_gen_3(INDEX_op_x86_psrldq_vec, TCG_TYPE_V128, MO_64, a0, a0, 8); break; + case INDEX_op_extsh_vec: + case INDEX_op_extuh_vec: + a1 = va_arg(va, TCGArg); + switch (type) { + case TCG_TYPE_V64: + vec_gen_3(INDEX_op_x86_psrldq_vec, type, MO_64, a0, a1, 4); + break; + case TCG_TYPE_V128: + vec_gen_3(INDEX_op_x86_psrldq_vec, type, MO_64, a0, a1, 8); + break; + case TCG_TYPE_V256: + vec_gen_4(INDEX_op_x86_vperm2i128_vec, type, 4, a0, a1, a1, 0x81); + break; + default: + g_assert_not_reached(); + } + vec_gen_2(opc == INDEX_op_extsh_vec ? INDEX_op_extsl_vec + : INDEX_op_extul_vec, type, vece, a0, a0); + break; + case INDEX_op_uzpe_vec: a1 = va_arg(va, TCGArg); a2 = va_arg(va, TCGArg); From patchwork Mon Dec 18 17:17:54 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122256 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3113048qgn; Mon, 18 Dec 2017 09:31:48 -0800 (PST) X-Google-Smtp-Source: ACJfBotyT6jYTFFZV9I1ZjV5RVVAGlEfTV32I5gq65mxkzcKwNfxKBWzapK1UrVM31ibvBDqD3u/ X-Received: by 10.129.121.129 with SMTP id u123mr414789ywc.462.1513618308152; Mon, 18 Dec 2017 09:31:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513618308; cv=none; d=google.com; s=arc-20160816; b=weVPzw+fTC2xYST/7EHAwp3nGAVQWpqoRMzyIlzHUzfuVng2DOGKXdwdte0VRMDbqC wPdrpVWb6Wmg/Y3ZtwAIhB9Xs2Vaz4Hhw5DH43VCDVPpDoZKuqahuxAjM7AqyROLmMMJ W74L6ysVcMzLdS+8RW30dCNLeoSPjILehWXGTActXNjO8KTY38OqUQouqJLyXPjxhZey 9gG+8vY//4jaTXiofbFq1iuCCLJTWzeZpOQxgAA65/njTqaXhiL9yck2HsU3Bex7TbF6 X5sgRk3qqhGs62o8BNH8yepQslymeNQw1emDoaihloyyPsaT2bGPjg+7iv2j3/s8hMie raxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=52x2COf8UwUEyQcq35qjtAcZZnCn7a8/X+asC2/WP3U=; b=kNnjv9v7e5exCq87b04zpIQ/c7/FkRYRAnUyOI8vx7p5lJl4i1u9qfAK+6YVdzIRe7 jo+7/dRfygFJk4OTkVmS9ZRbDAlOvwBOK0mSXqvtAniubg5PeurIEee2YX6m7FSVi2qK tEpHFRJuVehFEwUepGnCQ/XRVcoAAcRvZZqIqKs+t+tWiC6jrSW7vIxGy/MAQmpdW7Vd rdaMtjmN0X1NxWISS67mvTxZV9xeVms2XSaB0O8SJyv8ymJTQBiz1C5QFQkUVuXFr53v /oNCZ612DvJFTlYLJmbnV5cpUIvFs6UGHQtcysZnazcDNtLG2e2rS4pqspAzj3MGbSf4 RNoQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=jHKIa+MK; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id z63si462927ybf.315.2017.12.18.09.31.47 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:31:48 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=jHKIa+MK; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:58611 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzGV-0003Gz-HT for patch@linaro.org; Mon, 18 Dec 2017 12:31:47 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37568) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3p-0000i7-HQ for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:45 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3m-0001UW-3y for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:41 -0500 Received: from mail-pl0-x244.google.com ([2607:f8b0:400e:c01::244]:41829) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3l-0001Ta-Nl for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:38 -0500 Received: by mail-pl0-x244.google.com with SMTP id g2so5200932pli.8 for ; Mon, 18 Dec 2017 09:18:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=52x2COf8UwUEyQcq35qjtAcZZnCn7a8/X+asC2/WP3U=; b=jHKIa+MKAgWs7OArY7nwdY7Dleffo2Md0WWlzz5NHq0zvmekA5ZRtMx93tHXGsudUE dHueXQ6ePmClRQ0gAzp1RBlywU6uf2HhF+NlnkegQDgP0DO+VRgCogqFDXj4VXuLOnt9 3h5lLFWBgnIIvBgvsMLNf2txmvtpqdEKjt/bE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=52x2COf8UwUEyQcq35qjtAcZZnCn7a8/X+asC2/WP3U=; b=eUum5XxNk2AFYv8gETW9gzTISPMSkIkEHRuQXVPWB+jFLcCRAecwU7WLsQ9qzpqBBY iPfEDVj5jSJj0nQHAVjlofmMZNBa/2RDSj1PDCz0Nr4KnXqpBAONrZryVrlKBij8rIeJ urybHlXwIb/Ks29ubJuNfkNrLJP7tisMOVcaWm6mEztbfKSWZp60CvgchSj9kppRCpIH 1QJyU4io8BDC/pJsaUzT6v/E8D39tDVdhFcHp3XkClnDNFkMvRdX5W2/ce39H/yvFJmr 0fM+leDGoNhOIqIKB1JySe8f/qzLCm6979Lgmko/aha/Ou9xaloIIZK5XO5fBr/veyxI vxxA== X-Gm-Message-State: AKGB3mK8+yMSbfRwRlRzlU6eAC8IlUfkGJXXIgi+vqy6y7pm4hxSVmIt fb/uujdeWB14XYlvPQyR8z1OvTX8V9c= X-Received: by 10.84.178.129 with SMTP id z1mr383889plb.365.1513617516096; Mon, 18 Dec 2017 09:18:36 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.33 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:34 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:54 -0800 Message-Id: <20171218171758.16964-23-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::244 Subject: [Qemu-devel] [PATCH v7 22/26] tcg/aarch64: Add vector operations X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- tcg/aarch64/tcg-target.h | 30 +- tcg/aarch64/tcg-target.opc.h | 3 + tcg/aarch64/tcg-target.inc.c | 674 ++++++++++++++++++++++++++++++++++++++++--- 3 files changed, 660 insertions(+), 47 deletions(-) create mode 100644 tcg/aarch64/tcg-target.opc.h -- 2.14.3 diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h index c2525066ab..46434ecca4 100644 --- a/tcg/aarch64/tcg-target.h +++ b/tcg/aarch64/tcg-target.h @@ -31,13 +31,22 @@ typedef enum { TCG_REG_SP = 31, TCG_REG_XZR = 31, + TCG_REG_V0 = 32, TCG_REG_V1, TCG_REG_V2, TCG_REG_V3, + TCG_REG_V4, TCG_REG_V5, TCG_REG_V6, TCG_REG_V7, + TCG_REG_V8, TCG_REG_V9, TCG_REG_V10, TCG_REG_V11, + TCG_REG_V12, TCG_REG_V13, TCG_REG_V14, TCG_REG_V15, + TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19, + TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23, + TCG_REG_V24, TCG_REG_V25, TCG_REG_V26, TCG_REG_V27, + TCG_REG_V28, TCG_REG_V29, TCG_REG_V30, TCG_REG_V31, + /* Aliases. */ TCG_REG_FP = TCG_REG_X29, TCG_REG_LR = TCG_REG_X30, TCG_AREG0 = TCG_REG_X19, } TCGReg; -#define TCG_TARGET_NB_REGS 32 +#define TCG_TARGET_NB_REGS 64 /* used for function call generation */ #define TCG_REG_CALL_STACK TCG_REG_SP @@ -113,6 +122,25 @@ typedef enum { #define TCG_TARGET_HAS_mulsh_i64 1 #define TCG_TARGET_HAS_direct_jump 1 +#define TCG_TARGET_HAS_v64 1 +#define TCG_TARGET_HAS_v128 1 +#define TCG_TARGET_HAS_v256 0 + +#define TCG_TARGET_HAS_andc_vec 1 +#define TCG_TARGET_HAS_orc_vec 1 +#define TCG_TARGET_HAS_not_vec 1 +#define TCG_TARGET_HAS_neg_vec 1 +#define TCG_TARGET_HAS_shi_vec 1 +#define TCG_TARGET_HAS_shs_vec 0 +#define TCG_TARGET_HAS_shv_vec 0 +#define TCG_TARGET_HAS_zip_vec 1 +#define TCG_TARGET_HAS_uzp_vec 1 +#define TCG_TARGET_HAS_trn_vec 1 +#define TCG_TARGET_HAS_cmp_vec 1 +#define TCG_TARGET_HAS_mul_vec 1 +#define TCG_TARGET_HAS_extl_vec 1 +#define TCG_TARGET_HAS_exth_vec 1 + #define TCG_TARGET_DEFAULT_MO (0) static inline void flush_icache_range(uintptr_t start, uintptr_t stop) diff --git a/tcg/aarch64/tcg-target.opc.h b/tcg/aarch64/tcg-target.opc.h new file mode 100644 index 0000000000..4816a6c3d4 --- /dev/null +++ b/tcg/aarch64/tcg-target.opc.h @@ -0,0 +1,3 @@ +/* Target-specific opcodes for host vector expansion. These will be + emitted by tcg_expand_vec_op. For those familiar with GCC internals, + consider these to be UNSPEC with names. */ diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c index 150530f30e..b2ce818d7c 100644 --- a/tcg/aarch64/tcg-target.inc.c +++ b/tcg/aarch64/tcg-target.inc.c @@ -20,10 +20,15 @@ QEMU_BUILD_BUG_ON(TCG_TYPE_I32 != 0 || TCG_TYPE_I64 != 1); #ifdef CONFIG_DEBUG_TCG static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = { - "%x0", "%x1", "%x2", "%x3", "%x4", "%x5", "%x6", "%x7", - "%x8", "%x9", "%x10", "%x11", "%x12", "%x13", "%x14", "%x15", - "%x16", "%x17", "%x18", "%x19", "%x20", "%x21", "%x22", "%x23", - "%x24", "%x25", "%x26", "%x27", "%x28", "%fp", "%x30", "%sp", + "x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7", + "x8", "x9", "x10", "x11", "x12", "x13", "x14", "x15", + "x16", "x17", "x18", "x19", "x20", "x21", "x22", "x23", + "x24", "x25", "x26", "x27", "x28", "fp", "x30", "sp", + + "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", + "v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15", + "v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23", + "v24", "v25", "v26", "v27", "v28", "fp", "v30", "v31", }; #endif /* CONFIG_DEBUG_TCG */ @@ -43,6 +48,14 @@ static const int tcg_target_reg_alloc_order[] = { /* X19 reserved for AREG0 */ /* X29 reserved as fp */ /* X30 reserved as temporary */ + + TCG_REG_V0, TCG_REG_V1, TCG_REG_V2, TCG_REG_V3, + TCG_REG_V4, TCG_REG_V5, TCG_REG_V6, TCG_REG_V7, + /* V8 - V15 are call-saved, and skipped. */ + TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19, + TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23, + TCG_REG_V24, TCG_REG_V25, TCG_REG_V26, TCG_REG_V27, + TCG_REG_V28, TCG_REG_V29, TCG_REG_V30, TCG_REG_V31, }; static const int tcg_target_call_iarg_regs[8] = { @@ -54,6 +67,7 @@ static const int tcg_target_call_oarg_regs[1] = { }; #define TCG_REG_TMP TCG_REG_X30 +#define TCG_VEC_TMP TCG_REG_V31 #ifndef CONFIG_SOFTMMU /* Note that XZR cannot be encoded in the address base register slot, @@ -119,9 +133,13 @@ static const char *target_parse_constraint(TCGArgConstraint *ct, const char *ct_str, TCGType type) { switch (*ct_str++) { - case 'r': + case 'r': /* general registers */ ct->ct |= TCG_CT_REG; - ct->u.regs = 0xffffffffu; + ct->u.regs |= 0xffffffffu; + break; + case 'w': /* advsimd registers */ + ct->ct |= TCG_CT_REG; + ct->u.regs |= 0xffffffff00000000ull; break; case 'l': /* qemu_ld / qemu_st address, data_reg */ ct->ct |= TCG_CT_REG; @@ -178,6 +196,98 @@ static inline bool is_limm(uint64_t val) return (val & (val - 1)) == 0; } +static bool is_fimm(uint64_t v64, int *op, int *cmode, int *imm8) +{ + int i; + + *op = 0; + if (v64 == (-1ull / 0xff) * (v64 & 0xff)) { + *cmode = 0xe; + *imm8 = v64 & 0xff; + return true; + } + if (v64 == (-1ull / 0xffff) * (v64 & 0xffff)) { + uint64_t v16 = v64 & 0xffff; + + if (v16 == (v64 & 0xff)) { + *cmode = 0x8; + *imm8 = v64 & 0xff; + return true; + } else if (v16 == (v64 & 0xff00)) { + *cmode = 0xa; + *imm8 = v16 >> 8; + return true; + } + } + if (v64 == deposit64(v64, 32, 32, v64)) { + uint64_t v32 = (uint32_t)v64; + + if (v32 == (v64 & 0xff)) { + *cmode = 0x0; + *imm8 = v64 & 0xff; + return true; + } else if (v32 == (v32 & 0xff00)) { + *cmode = 0x2; + *imm8 = (v64 >> 8) & 0xff; + return true; + } else if (v32 == (v32 & 0xff0000)) { + *cmode = 0x4; + *imm8 = (v64 >> 16) & 0xff; + return true; + } else if (v32 == (v32 & 0xff000000)) { + *cmode = 0x6; + *imm8 = v32 >> 24; + return true; + } else if ((v32 & 0xffff00ff) == 0xff) { + *cmode = 0xc; + *imm8 = (v64 >> 8) & 0xff; + return true; + } else if ((v32 & 0xff00ffff) == 0xffff) { + *cmode = 0xd; + *imm8 = (v64 >> 16) & 0xff; + return true; + } else if (extract32(v32, 0, 19) == 0 + && (extract32(v32, 25, 6) == 0x20 + || extract32(v32, 25, 6) == 0x1f)) { + *cmode = 0xf; + *imm8 = (extract32(v32, 31, 1) << 7) + | (extract32(v32, 25, 1) << 6) + | extract32(v32, 19, 6); + return true; + } + } + if (extract64(v64, 0, 48) == 0 + && (extract64(v64, 54, 9) == 0x100 + || extract64(v64, 54, 9) == 0x0ff)) { + *cmode = 0xf; + *op = 1; + *imm8 = (extract64(v64, 63, 1) << 7) + | (extract64(v64, 54, 1) << 6) + | extract64(v64, 48, 6); + return true; + } + for (i = 0; i < 64; i += 8) { + uint64_t byte = extract64(v64, i, 8); + if (byte != 0 && byte != 0xff) { + break; + } + } + if (i == 64) { + *cmode = 0xe; + *op = 1; + *imm8 = (extract64(v64, 0, 1) << 0) + | (extract64(v64, 8, 1) << 1) + | (extract64(v64, 16, 1) << 2) + | (extract64(v64, 24, 1) << 3) + | (extract64(v64, 32, 1) << 4) + | (extract64(v64, 40, 1) << 5) + | (extract64(v64, 48, 1) << 6) + | (extract64(v64, 56, 1) << 7); + return true; + } + return false; +} + static int tcg_target_const_match(tcg_target_long val, TCGType type, const TCGArgConstraint *arg_ct) { @@ -271,6 +381,9 @@ typedef enum { /* Load literal for loading the address at pc-relative offset */ I3305_LDR = 0x58000000, + I3305_LDR_v64 = 0x5c000000, + I3305_LDR_v128 = 0x9c000000, + /* Load/store register. Described here as 3.3.12, but the helper that emits them can transform to 3.3.10 or 3.3.13. */ I3312_STRB = 0x38000000 | LDST_ST << 22 | MO_8 << 30, @@ -290,6 +403,15 @@ typedef enum { I3312_LDRSHX = 0x38000000 | LDST_LD_S_X << 22 | MO_16 << 30, I3312_LDRSWX = 0x38000000 | LDST_LD_S_X << 22 | MO_32 << 30, + I3312_LDRVS = 0x3c000000 | LDST_LD << 22 | MO_32 << 30, + I3312_STRVS = 0x3c000000 | LDST_ST << 22 | MO_32 << 30, + + I3312_LDRVD = 0x3c000000 | LDST_LD << 22 | MO_64 << 30, + I3312_STRVD = 0x3c000000 | LDST_ST << 22 | MO_64 << 30, + + I3312_LDRVQ = 0x3c000000 | 3 << 22 | 0 << 30, + I3312_STRVQ = 0x3c000000 | 2 << 22 | 0 << 30, + I3312_TO_I3310 = 0x00200800, I3312_TO_I3313 = 0x01000000, @@ -374,8 +496,58 @@ typedef enum { I3510_EON = 0x4a200000, I3510_ANDS = 0x6a000000, - NOP = 0xd503201f, + /* AdvSIMD zip.uzp/trn */ + I3603_ZIP1 = 0x0e003800, + I3603_UZP1 = 0x0e001800, + I3603_TRN1 = 0x0e002800, + I3603_ZIP2 = 0x0e007800, + I3603_UZP2 = 0x0e005800, + I3603_TRN2 = 0x0e006800, + + /* AdvSIMD copy */ + I3605_DUP = 0x0e000400, + I3605_INS = 0x4e001c00, + I3605_UMOV = 0x0e003c00, + + /* AdvSIMD modified immediate */ + I3606_MOVI = 0x0f000400, + + /* AdvSIMD shift by immediate */ + I3614_SSHR = 0x0f000400, + I3614_SSRA = 0x0f001400, + I3614_SHL = 0x0f005400, + I3614_SSHLL = 0x0f00a400, + I3614_USHR = 0x2f000400, + I3614_USRA = 0x2f001400, + I3614_USHLL = 0x2f00a400, + + /* AdvSIMD three same. */ + I3616_ADD = 0x0e208400, + I3616_AND = 0x0e201c00, + I3616_BIC = 0x0e601c00, + I3616_EOR = 0x2e201c00, + I3616_MUL = 0x0e209c00, + I3616_ORR = 0x0ea01c00, + I3616_ORN = 0x0ee01c00, + I3616_SUB = 0x2e208400, + I3616_CMGT = 0x0e203400, + I3616_CMGE = 0x0e203c00, + I3616_CMTST = 0x0e208c00, + I3616_CMHI = 0x2e203400, + I3616_CMHS = 0x2e203c00, + I3616_CMEQ = 0x2e208c00, + + /* AdvSIMD two-reg misc. */ + I3617_CMGT0 = 0x0e208800, + I3617_CMEQ0 = 0x0e209800, + I3617_CMLT0 = 0x0e20a800, + I3617_CMGE0 = 0x2e208800, + I3617_CMLE0 = 0x2e20a800, + I3617_NOT = 0x2e205800, + I3617_NEG = 0x2e20b800, + /* System instructions. */ + NOP = 0xd503201f, DMB_ISH = 0xd50338bf, DMB_LD = 0x00000100, DMB_ST = 0x00000200, @@ -520,26 +692,71 @@ static void tcg_out_insn_3509(TCGContext *s, AArch64Insn insn, TCGType ext, tcg_out32(s, insn | ext << 31 | rm << 16 | ra << 10 | rn << 5 | rd); } +static void tcg_out_insn_3603(TCGContext *s, AArch64Insn insn, bool q, + unsigned size, TCGReg rd, TCGReg rn, TCGReg rm) +{ + tcg_out32(s, insn | q << 30 | (size << 22) | (rd & 0x1f) + | (rn & 0x1f) << 5 | (rm & 0x1f) << 16); +} + +static void tcg_out_insn_3605(TCGContext *s, AArch64Insn insn, bool q, + TCGReg rd, TCGReg rn, int dst_idx, int src_idx) +{ + /* Note that bit 11 set means general register input. Therefore + we can handle both register sets with one function. */ + tcg_out32(s, insn | q << 30 | (dst_idx << 16) | (src_idx << 11) + | (rd & 0x1f) | (~rn & 0x20) << 6 | (rn & 0x1f) << 5); +} + +static void tcg_out_insn_3606(TCGContext *s, AArch64Insn insn, bool q, + TCGReg rd, bool op, int cmode, uint8_t imm8) +{ + tcg_out32(s, insn | q << 30 | op << 29 | cmode << 12 | (rd & 0x1f) + | (imm8 & 0xe0) << (16 - 5) | (imm8 & 0x1f) << 5); +} + +static void tcg_out_insn_3614(TCGContext *s, AArch64Insn insn, bool q, + TCGReg rd, TCGReg rn, unsigned immhb) +{ + tcg_out32(s, insn | q << 30 | immhb << 16 + | (rn & 0x1f) << 5 | (rd & 0x1f)); +} + +static void tcg_out_insn_3616(TCGContext *s, AArch64Insn insn, bool q, + unsigned size, TCGReg rd, TCGReg rn, TCGReg rm) +{ + tcg_out32(s, insn | q << 30 | (size << 22) | (rm & 0x1f) << 16 + | (rn & 0x1f) << 5 | (rd & 0x1f)); +} + +static void tcg_out_insn_3617(TCGContext *s, AArch64Insn insn, bool q, + unsigned size, TCGReg rd, TCGReg rn) +{ + tcg_out32(s, insn | q << 30 | (size << 22) + | (rn & 0x1f) << 5 | (rd & 0x1f)); +} + static void tcg_out_insn_3310(TCGContext *s, AArch64Insn insn, TCGReg rd, TCGReg base, TCGType ext, TCGReg regoff) { /* Note the AArch64Insn constants above are for C3.3.12. Adjust. */ tcg_out32(s, insn | I3312_TO_I3310 | regoff << 16 | - 0x4000 | ext << 13 | base << 5 | rd); + 0x4000 | ext << 13 | base << 5 | (rd & 0x1f)); } static void tcg_out_insn_3312(TCGContext *s, AArch64Insn insn, TCGReg rd, TCGReg rn, intptr_t offset) { - tcg_out32(s, insn | (offset & 0x1ff) << 12 | rn << 5 | rd); + tcg_out32(s, insn | (offset & 0x1ff) << 12 | rn << 5 | (rd & 0x1f)); } static void tcg_out_insn_3313(TCGContext *s, AArch64Insn insn, TCGReg rd, TCGReg rn, uintptr_t scaled_uimm) { /* Note the AArch64Insn constants above are for C3.3.12. Adjust. */ - tcg_out32(s, insn | I3312_TO_I3313 | scaled_uimm << 10 | rn << 5 | rd); + tcg_out32(s, insn | I3312_TO_I3313 | scaled_uimm << 10 + | rn << 5 | (rd & 0x1f)); } /* Register to register move using ORR (shifted register with no shift). */ @@ -585,6 +802,35 @@ static void tcg_out_logicali(TCGContext *s, AArch64Insn insn, TCGType ext, tcg_out_insn_3404(s, insn, ext, rd, rn, ext, r, c); } +static void tcg_out_dupi_vec(TCGContext *s, TCGType type, + TCGReg rd, uint64_t v64) +{ + int op, cmode, imm8; + + if (is_fimm(v64, &op, &cmode, &imm8)) { + tcg_out_insn(s, 3606, MOVI, type == TCG_TYPE_V128, rd, op, cmode, imm8); + } else if (type == TCG_TYPE_V128) { + new_pool_l2(s, R_AARCH64_CONDBR19, s->code_ptr, 0, v64, v64); + tcg_out_insn(s, 3305, LDR_v128, 0, rd); + } else { + new_pool_label(s, v64, R_AARCH64_CONDBR19, s->code_ptr, 0); + tcg_out_insn(s, 3305, LDR_v64, 0, rd); + } +} + +static void tcg_out_movi_vec(TCGContext *s, TCGType type, + TCGReg ret, const TCGArg *a) +{ + if (type == TCG_TYPE_V128) { + /* We assume that INDEX_op_dupi could not be used and + therefore we must use a constant pool entry. */ + new_pool_l2(s, R_AARCH64_CONDBR19, s->code_ptr, 0, a[0], a[1]); + tcg_out_insn(s, 3305, LDR_v128, 0, ret); + } else { + tcg_out_dupi_vec(s, TCG_TYPE_V64, ret, a[0]); + } +} + static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd, tcg_target_long value) { @@ -594,6 +840,22 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd, int s0, s1; AArch64Insn opc; + switch (type) { + case TCG_TYPE_I32: + case TCG_TYPE_I64: + tcg_debug_assert(rd < 32); + break; + + case TCG_TYPE_V64: + case TCG_TYPE_V128: + tcg_debug_assert(rd >= 32); + tcg_out_dupi_vec(s, type, rd, value); + return; + + default: + g_assert_not_reached(); + } + /* For 32-bit values, discard potential garbage in value. For 64-bit values within [2**31, 2**32-1], we can create smaller sequences by interpreting this as a negative 32-bit number, while ensuring that @@ -669,15 +931,13 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd, /* Define something more legible for general use. */ #define tcg_out_ldst_r tcg_out_insn_3310 -static void tcg_out_ldst(TCGContext *s, AArch64Insn insn, - TCGReg rd, TCGReg rn, intptr_t offset) +static void tcg_out_ldst(TCGContext *s, AArch64Insn insn, TCGReg rd, + TCGReg rn, intptr_t offset, int lgsize) { - TCGMemOp size = (uint32_t)insn >> 30; - /* If the offset is naturally aligned and in range, then we can use the scaled uimm12 encoding */ - if (offset >= 0 && !(offset & ((1 << size) - 1))) { - uintptr_t scaled_uimm = offset >> size; + if (offset >= 0 && !(offset & ((1 << lgsize) - 1))) { + uintptr_t scaled_uimm = offset >> lgsize; if (scaled_uimm <= 0xfff) { tcg_out_insn_3313(s, insn, rd, rn, scaled_uimm); return; @@ -695,32 +955,102 @@ static void tcg_out_ldst(TCGContext *s, AArch64Insn insn, tcg_out_ldst_r(s, insn, rd, rn, TCG_TYPE_I64, TCG_REG_TMP); } -static inline void tcg_out_mov(TCGContext *s, - TCGType type, TCGReg ret, TCGReg arg) +static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg) { - if (ret != arg) { - tcg_out_movr(s, type, ret, arg); + if (ret == arg) { + return; + } + switch (type) { + case TCG_TYPE_I32: + case TCG_TYPE_I64: + if (ret < 32 && arg < 32) { + tcg_out_movr(s, type, ret, arg); + break; + } else if (ret < 32) { + tcg_out_insn(s, 3605, UMOV, type, ret, arg, 0, 0); + break; + } else if (arg < 32) { + tcg_out_insn(s, 3605, INS, 0, ret, arg, 4 << type, 0); + break; + } + /* FALLTHRU */ + + case TCG_TYPE_V64: + tcg_debug_assert(ret >= 32 && arg >= 32); + tcg_out_insn(s, 3616, ORR, 0, 0, ret, arg, arg); + break; + case TCG_TYPE_V128: + tcg_debug_assert(ret >= 32 && arg >= 32); + tcg_out_insn(s, 3616, ORR, 1, 0, ret, arg, arg); + break; + + default: + g_assert_not_reached(); } } -static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg, - TCGReg arg1, intptr_t arg2) +static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, + TCGReg base, intptr_t ofs) { - tcg_out_ldst(s, type == TCG_TYPE_I32 ? I3312_LDRW : I3312_LDRX, - arg, arg1, arg2); + AArch64Insn insn; + int lgsz; + + switch (type) { + case TCG_TYPE_I32: + insn = (ret < 32 ? I3312_LDRW : I3312_LDRVS); + lgsz = 2; + break; + case TCG_TYPE_I64: + insn = (ret < 32 ? I3312_LDRX : I3312_LDRVD); + lgsz = 3; + break; + case TCG_TYPE_V64: + insn = I3312_LDRVD; + lgsz = 3; + break; + case TCG_TYPE_V128: + insn = I3312_LDRVQ; + lgsz = 4; + break; + default: + g_assert_not_reached(); + } + tcg_out_ldst(s, insn, ret, base, ofs, lgsz); } -static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, - TCGReg arg1, intptr_t arg2) +static void tcg_out_st(TCGContext *s, TCGType type, TCGReg src, + TCGReg base, intptr_t ofs) { - tcg_out_ldst(s, type == TCG_TYPE_I32 ? I3312_STRW : I3312_STRX, - arg, arg1, arg2); + AArch64Insn insn; + int lgsz; + + switch (type) { + case TCG_TYPE_I32: + insn = (src < 32 ? I3312_STRW : I3312_STRVS); + lgsz = 2; + break; + case TCG_TYPE_I64: + insn = (src < 32 ? I3312_STRX : I3312_STRVD); + lgsz = 3; + break; + case TCG_TYPE_V64: + insn = I3312_STRVD; + lgsz = 3; + break; + case TCG_TYPE_V128: + insn = I3312_STRVQ; + lgsz = 4; + break; + default: + g_assert_not_reached(); + } + tcg_out_ldst(s, insn, src, base, ofs, lgsz); } static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val, TCGReg base, intptr_t ofs) { - if (val == 0) { + if (type <= TCG_TYPE_I64 && val == 0) { tcg_out_st(s, type, TCG_REG_XZR, base, ofs); return true; } @@ -1210,14 +1540,15 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp opc, /* Merge "low bits" from tlb offset, load the tlb comparator into X0. X0 = load [X2 + (tlb_offset & 0x000fff)] */ tcg_out_ldst(s, TARGET_LONG_BITS == 32 ? I3312_LDRW : I3312_LDRX, - TCG_REG_X0, TCG_REG_X2, tlb_offset & 0xfff); + TCG_REG_X0, TCG_REG_X2, tlb_offset & 0xfff, + TARGET_LONG_BITS == 32 ? 2 : 3); /* Load the tlb addend. Do that early to avoid stalling. X1 = load [X2 + (tlb_offset & 0xfff) + offsetof(addend)] */ tcg_out_ldst(s, I3312_LDRX, TCG_REG_X1, TCG_REG_X2, (tlb_offset & 0xfff) + (offsetof(CPUTLBEntry, addend)) - (is_read ? offsetof(CPUTLBEntry, addr_read) - : offsetof(CPUTLBEntry, addr_write))); + : offsetof(CPUTLBEntry, addr_write)), 3); /* Perform the address comparison. */ tcg_out_cmp(s, (TARGET_LONG_BITS == 64), TCG_REG_X0, TCG_REG_X3, 0); @@ -1435,49 +1766,49 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, case INDEX_op_ld8u_i32: case INDEX_op_ld8u_i64: - tcg_out_ldst(s, I3312_LDRB, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRB, a0, a1, a2, 0); break; case INDEX_op_ld8s_i32: - tcg_out_ldst(s, I3312_LDRSBW, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRSBW, a0, a1, a2, 0); break; case INDEX_op_ld8s_i64: - tcg_out_ldst(s, I3312_LDRSBX, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRSBX, a0, a1, a2, 0); break; case INDEX_op_ld16u_i32: case INDEX_op_ld16u_i64: - tcg_out_ldst(s, I3312_LDRH, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRH, a0, a1, a2, 1); break; case INDEX_op_ld16s_i32: - tcg_out_ldst(s, I3312_LDRSHW, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRSHW, a0, a1, a2, 1); break; case INDEX_op_ld16s_i64: - tcg_out_ldst(s, I3312_LDRSHX, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRSHX, a0, a1, a2, 1); break; case INDEX_op_ld_i32: case INDEX_op_ld32u_i64: - tcg_out_ldst(s, I3312_LDRW, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRW, a0, a1, a2, 2); break; case INDEX_op_ld32s_i64: - tcg_out_ldst(s, I3312_LDRSWX, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRSWX, a0, a1, a2, 2); break; case INDEX_op_ld_i64: - tcg_out_ldst(s, I3312_LDRX, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRX, a0, a1, a2, 3); break; case INDEX_op_st8_i32: case INDEX_op_st8_i64: - tcg_out_ldst(s, I3312_STRB, REG0(0), a1, a2); + tcg_out_ldst(s, I3312_STRB, REG0(0), a1, a2, 0); break; case INDEX_op_st16_i32: case INDEX_op_st16_i64: - tcg_out_ldst(s, I3312_STRH, REG0(0), a1, a2); + tcg_out_ldst(s, I3312_STRH, REG0(0), a1, a2, 1); break; case INDEX_op_st_i32: case INDEX_op_st32_i64: - tcg_out_ldst(s, I3312_STRW, REG0(0), a1, a2); + tcg_out_ldst(s, I3312_STRW, REG0(0), a1, a2, 2); break; case INDEX_op_st_i64: - tcg_out_ldst(s, I3312_STRX, REG0(0), a1, a2); + tcg_out_ldst(s, I3312_STRX, REG0(0), a1, a2, 3); break; case INDEX_op_add_i32: @@ -1776,25 +2107,230 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, case INDEX_op_mov_i32: /* Always emitted via tcg_out_mov. */ case INDEX_op_mov_i64: + case INDEX_op_mov_vec: case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi. */ case INDEX_op_movi_i64: + case INDEX_op_dupi_vec: case INDEX_op_call: /* Always emitted via tcg_out_call. */ default: - tcg_abort(); + g_assert_not_reached(); } #undef REG0 } +static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, + unsigned vecl, unsigned vece, + const TCGArg *args, const int *const_args) +{ + static const AArch64Insn cmp_insn[16] = { + [TCG_COND_EQ] = I3616_CMEQ, + [TCG_COND_GT] = I3616_CMGT, + [TCG_COND_GE] = I3616_CMGE, + [TCG_COND_GTU] = I3616_CMHI, + [TCG_COND_GEU] = I3616_CMHS, + }; + static const AArch64Insn cmp0_insn[16] = { + [TCG_COND_EQ] = I3617_CMEQ0, + [TCG_COND_GT] = I3617_CMGT0, + [TCG_COND_GE] = I3617_CMGE0, + [TCG_COND_LT] = I3617_CMLT0, + [TCG_COND_LE] = I3617_CMLE0, + }; + + TCGType type = vecl + TCG_TYPE_V64; + unsigned is_q = vecl; + TCGArg a0, a1, a2; + + a0 = args[0]; + a1 = args[1]; + a2 = args[2]; + + switch (opc) { + case INDEX_op_movi_vec: + tcg_out_movi_vec(s, type, a0, args + 1); + break; + case INDEX_op_ld_vec: + tcg_out_ld(s, type, a0, a1, a2); + break; + case INDEX_op_st_vec: + tcg_out_st(s, type, a0, a1, a2); + break; + case INDEX_op_add_vec: + tcg_out_insn(s, 3616, ADD, is_q, vece, a0, a1, a2); + break; + case INDEX_op_sub_vec: + tcg_out_insn(s, 3616, SUB, is_q, vece, a0, a1, a2); + break; + case INDEX_op_mul_vec: + tcg_out_insn(s, 3616, MUL, is_q, vece, a0, a1, a2); + break; + case INDEX_op_neg_vec: + tcg_out_insn(s, 3617, NEG, is_q, vece, a0, a1); + break; + case INDEX_op_and_vec: + tcg_out_insn(s, 3616, AND, is_q, 0, a0, a1, a2); + break; + case INDEX_op_or_vec: + tcg_out_insn(s, 3616, ORR, is_q, 0, a0, a1, a2); + break; + case INDEX_op_xor_vec: + tcg_out_insn(s, 3616, EOR, is_q, 0, a0, a1, a2); + break; + case INDEX_op_andc_vec: + tcg_out_insn(s, 3616, BIC, is_q, 0, a0, a1, a2); + break; + case INDEX_op_orc_vec: + tcg_out_insn(s, 3616, ORN, is_q, 0, a0, a1, a2); + break; + case INDEX_op_not_vec: + tcg_out_insn(s, 3617, NOT, is_q, 0, a0, a1); + break; + case INDEX_op_dup_vec: + tcg_out_insn(s, 3605, DUP, is_q, a0, a1, 1 << vece, 0); + break; + case INDEX_op_shli_vec: + tcg_out_insn(s, 3614, SHL, is_q, a0, a1, a2 + (8 << vece)); + break; + case INDEX_op_shri_vec: + tcg_out_insn(s, 3614, USHR, is_q, a0, a1, (16 << vece) - a2); + break; + case INDEX_op_sari_vec: + tcg_out_insn(s, 3614, SSHR, is_q, a0, a1, (16 << vece) - a2); + break; + case INDEX_op_cmp_vec: + { + TCGCond cond = args[3]; + AArch64Insn insn; + + if (cond == TCG_COND_NE) { + if (const_args[2]) { + tcg_out_insn(s, 3616, CMTST, is_q, vece, a0, a1, a1); + } else { + tcg_out_insn(s, 3616, CMEQ, is_q, vece, a0, a1, a2); + tcg_out_insn(s, 3617, NOT, is_q, 0, a0, a0); + } + } else { + if (const_args[2]) { + insn = cmp0_insn[cond]; + if (insn) { + tcg_out_insn_3617(s, insn, is_q, vece, a0, a1); + break; + } + tcg_out_dupi_vec(s, type, TCG_VEC_TMP, 0); + a2 = TCG_VEC_TMP; + } + insn = cmp_insn[cond]; + if (insn == 0) { + TCGArg t; + t = a1, a1 = a2, a2 = t; + cond = tcg_swap_cond(cond); + insn = cmp_insn[cond]; + tcg_debug_assert(insn != 0); + } + tcg_out_insn_3616(s, insn, is_q, vece, a0, a1, a2); + } + } + break; + case INDEX_op_zipl_vec: + tcg_out_insn(s, 3603, ZIP1, is_q, vece, a0, a1, a2); + break; + case INDEX_op_ziph_vec: + tcg_out_insn(s, 3603, ZIP2, is_q, vece, a0, a1, a2); + break; + case INDEX_op_uzpe_vec: + tcg_out_insn(s, 3603, UZP1, is_q, vece, a0, a1, a2); + break; + case INDEX_op_uzpo_vec: + tcg_out_insn(s, 3603, UZP2, is_q, vece, a0, a1, a2); + break; + case INDEX_op_trne_vec: + tcg_out_insn(s, 3603, TRN1, is_q, vece, a0, a1, a2); + break; + case INDEX_op_trno_vec: + tcg_out_insn(s, 3603, TRN2, is_q, vece, a0, a1, a2); + break; + case INDEX_op_extul_vec: + tcg_out_insn(s, 3614, USHLL, 0, a0, a1, 0 + (8 << vece)); + break; + case INDEX_op_extuh_vec: + if (is_q) { + tcg_out_insn(s, 3614, USHLL, 1, a0, a1, 0 + (8 << vece)); + } else { + tcg_out_insn(s, 3614, USHLL, 0, a0, a1, 0 + (8 << vece)); + tcg_out_insn(s, 3605, INS, 0, a0, a0, 8, 16); + } + break; + case INDEX_op_extsl_vec: + tcg_out_insn(s, 3614, SSHLL, 0, a0, a1, 0 + (8 << vece)); + break; + case INDEX_op_extsh_vec: + if (is_q) { + tcg_out_insn(s, 3614, SSHLL, 1, a0, a1, 0 + (8 << vece)); + } else { + tcg_out_insn(s, 3614, SSHLL, 0, a0, a1, 0 + (8 << vece)); + tcg_out_insn(s, 3605, INS, 0, a0, a0, 8, 16); + } + break; + default: + g_assert_not_reached(); + } +} + +int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) +{ + switch (opc) { + case INDEX_op_add_vec: + case INDEX_op_sub_vec: + case INDEX_op_mul_vec: + case INDEX_op_and_vec: + case INDEX_op_or_vec: + case INDEX_op_xor_vec: + case INDEX_op_andc_vec: + case INDEX_op_orc_vec: + case INDEX_op_neg_vec: + case INDEX_op_not_vec: + case INDEX_op_cmp_vec: + case INDEX_op_zipl_vec: + case INDEX_op_ziph_vec: + case INDEX_op_uzpe_vec: + case INDEX_op_uzpo_vec: + case INDEX_op_trne_vec: + case INDEX_op_trno_vec: + case INDEX_op_shli_vec: + case INDEX_op_shri_vec: + case INDEX_op_sari_vec: + case INDEX_op_extul_vec: + case INDEX_op_extuh_vec: + case INDEX_op_extsl_vec: + case INDEX_op_extsh_vec: + return 1; + + default: + return 0; + } +} + +void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece, + TCGArg a0, ...) +{ +} + static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) { static const TCGTargetOpDef r = { .args_ct_str = { "r" } }; + static const TCGTargetOpDef w = { .args_ct_str = { "w" } }; static const TCGTargetOpDef r_r = { .args_ct_str = { "r", "r" } }; + static const TCGTargetOpDef w_w = { .args_ct_str = { "w", "w" } }; + static const TCGTargetOpDef w_r = { .args_ct_str = { "w", "r" } }; + static const TCGTargetOpDef w_wr = { .args_ct_str = { "w", "wr" } }; static const TCGTargetOpDef r_l = { .args_ct_str = { "r", "l" } }; static const TCGTargetOpDef r_rA = { .args_ct_str = { "r", "rA" } }; static const TCGTargetOpDef rZ_r = { .args_ct_str = { "rZ", "r" } }; static const TCGTargetOpDef lZ_l = { .args_ct_str = { "lZ", "l" } }; static const TCGTargetOpDef r_r_r = { .args_ct_str = { "r", "r", "r" } }; + static const TCGTargetOpDef w_w_w = { .args_ct_str = { "w", "w", "w" } }; + static const TCGTargetOpDef w_w_wZ = { .args_ct_str = { "w", "w", "wZ" } }; static const TCGTargetOpDef r_r_ri = { .args_ct_str = { "r", "r", "ri" } }; static const TCGTargetOpDef r_r_rA = { .args_ct_str = { "r", "r", "rA" } }; static const TCGTargetOpDef r_r_rL = { .args_ct_str = { "r", "r", "rL" } }; @@ -1938,6 +2474,41 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) case INDEX_op_sub2_i64: return &add2; + case INDEX_op_movi_vec: + return &w; + case INDEX_op_add_vec: + case INDEX_op_sub_vec: + case INDEX_op_mul_vec: + case INDEX_op_and_vec: + case INDEX_op_or_vec: + case INDEX_op_xor_vec: + case INDEX_op_andc_vec: + case INDEX_op_orc_vec: + case INDEX_op_zipl_vec: + case INDEX_op_ziph_vec: + case INDEX_op_uzpe_vec: + case INDEX_op_uzpo_vec: + case INDEX_op_trne_vec: + case INDEX_op_trno_vec: + return &w_w_w; + case INDEX_op_not_vec: + case INDEX_op_neg_vec: + case INDEX_op_shli_vec: + case INDEX_op_shri_vec: + case INDEX_op_sari_vec: + case INDEX_op_extul_vec: + case INDEX_op_extuh_vec: + case INDEX_op_extsl_vec: + case INDEX_op_extsh_vec: + return &w_w; + case INDEX_op_ld_vec: + case INDEX_op_st_vec: + return &w_r; + case INDEX_op_dup_vec: + return &w_wr; + case INDEX_op_cmp_vec: + return &w_w_wZ; + default: return NULL; } @@ -1947,8 +2518,10 @@ static void tcg_target_init(TCGContext *s) { tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffffu; tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffffu; + tcg_target_available_regs[TCG_TYPE_V64] = 0xffffffff00000000ull; + tcg_target_available_regs[TCG_TYPE_V128] = 0xffffffff00000000ull; - tcg_target_call_clobber_regs = 0xfffffffu; + tcg_target_call_clobber_regs = -1ull; tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_X19); tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_X20); tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_X21); @@ -1960,12 +2533,21 @@ static void tcg_target_init(TCGContext *s) tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_X27); tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_X28); tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_X29); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V8); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V9); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V10); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V11); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V12); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V13); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V14); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V15); s->reserved_regs = 0; tcg_regset_set_reg(s->reserved_regs, TCG_REG_SP); tcg_regset_set_reg(s->reserved_regs, TCG_REG_FP); tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP); tcg_regset_set_reg(s->reserved_regs, TCG_REG_X18); /* platform register */ + tcg_regset_set_reg(s->reserved_regs, TCG_VEC_TMP); } /* Saving pairs: (X19, X20) .. (X27, X28), (X29(fp), X30(lr)). */ From patchwork Mon Dec 18 17:17:55 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122268 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3126736qgn; Mon, 18 Dec 2017 09:45:32 -0800 (PST) X-Google-Smtp-Source: ACJfBou9vx2LtA6OWXOQCt6ieJxygVxnsdp0efW5Vc2xrEn5Wfu1nnGXF47AIg+DihElO77uU2YH X-Received: by 10.37.55.85 with SMTP id e82mr444836yba.177.1513619132382; Mon, 18 Dec 2017 09:45:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513619132; cv=none; d=google.com; s=arc-20160816; b=LyRftaHsIaeunSs8uINO+R6XF2pNW2I4nzVq04bflCnCBrow2uKLHQJ4jRezmZ0ezP Nr3/R8iLEFtUuQLbPdiNduCQoUuxtJMnZLAINJCuE4m203lSrRRm9BKs4gJ62S/mFM4W /scsIR3EScK0p7JvhvU5YQYbJlw3T6gC1NcC9Kc9bYF4F006faiSzr4NQxLN/MEmZNi7 cVWUR39tBdKch+x30IJJFjmbLcbgWyWa6ZFhYQlHsDwrJiaCRfWcq7tDXhauh/yKXm6H mRURY248P4jFNyBBz+z02vmzRebu4W7Bdd9MGzKC+OVlMlIYjwfSdWIw5C3ZJWzmPJ5T lbcA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=ecSxpiLajjqbroC9M5HiLYa4/wffUO6dfZeKFuckgCw=; b=VpVJ2fNGBsuHfadsKiQDWyb1EbtBHNmHYFfRpM444ejShjmK7K6j2vq8g7SRQBrcIS C/rGn7DRk/qOOEuY4pHmUCuazdlK5jWJfjAvxekyUMAgPv0qhhoQoSkIQA2Aq/IBuTuP qc8rBQuWBBCaiMHVSxMdH+6zJ+qJIrwpBnIXJDFMEWz9vjErEhnurZCzjSYsOM5AdsAd o075gfBXKMaL0rQgec1osCR5Z8qtUVvbV6javAAkD5MRWmeavIeC1r2AqbVhLh2l/2fe Jb4GrCASR+ZKQqMb4em0lV4eB2M30VtSc3vjzSAOFAS7uMDZ/wI9siobsCeYf1xHU3sl 7ncA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=X8AEyKWG; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id x185si1163453ybb.22.2017.12.18.09.45.32 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:45:32 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=X8AEyKWG; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:59086 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzTn-0007Bj-QQ for patch@linaro.org; Mon, 18 Dec 2017 12:45:31 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37554) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3o-0000i1-L8 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:42 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3n-0001W1-2T for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:40 -0500 Received: from mail-pl0-x244.google.com ([2607:f8b0:400e:c01::244]:37937) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3m-0001V0-R7 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:39 -0500 Received: by mail-pl0-x244.google.com with SMTP id s10so5192334plj.5 for ; Mon, 18 Dec 2017 09:18:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=ecSxpiLajjqbroC9M5HiLYa4/wffUO6dfZeKFuckgCw=; b=X8AEyKWGCGXQ3Q4kk3p+kEddgnz+yFaz4ECK/SpS6AggxmClZ+HNHYJFyWI0lMdane 7/FxfgFnYqP9QlvfUHrHrPjNmS3NbdYNQLvsjto2pvgW1HdpVtgHs5KH+7TvEV5MHoMt mZ/09aQNxookpiB+AOS3l96cu2IuERw1R2IoI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ecSxpiLajjqbroC9M5HiLYa4/wffUO6dfZeKFuckgCw=; b=qVienpWCnHAX0/zyVV4YaF76t0bBf4ycDoXdKcPf7WAXfsm2pDlSxYraKSETL46C04 H+mQ6+ksDitOFwWykQHN7WxsN31dO0n6gXYX24gj0XiaUoTY7HjQS+WnWKE4qM90/fDN 0wKXIySv62TW2gN4F+k4cWcztpRfuZ5Dr6zOKdscCvwxpG+2SHlUyQ6ga9YELmOUzhyq EUPk/kxKbVceDOT1kBU7FUIGC9tloKu2CZ1EiM30hH1+XVH5NelVA8Z/WQe+Q1Yzod5O axn8msdUUBWIsDKtAcHG0RUxaXjZp1VUkbnN962FphB40l7t2xPW4sG6lHso+e7EXRwQ Xr2Q== X-Gm-Message-State: AKGB3mLwFqqyfyQMQRlQv6fVM2cwj9IwPfoF0Qok16TwZ6lWPz7+fnZm Fv3k09fcP3JAQDDbOrUwhUxmyHSwl6A= X-Received: by 10.84.252.148 with SMTP id y20mr385463pll.245.1513617517343; Mon, 18 Dec 2017 09:18:37 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.36 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:36 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:55 -0800 Message-Id: <20171218171758.16964-24-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::244 Subject: [Qemu-devel] [PATCH v7 23/26] tcg/optimize: Handle vector opcodes during optimize X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Trivial move and constant propagation. Some identity and constant function folding, but nothing that requires knowledge of the size of the vector element. Signed-off-by: Richard Henderson --- tcg/optimize.c | 141 ++++++++++++++++++++++++++++----------------------------- 1 file changed, 68 insertions(+), 73 deletions(-) -- 2.14.3 diff --git a/tcg/optimize.c b/tcg/optimize.c index 2cbbeefd53..f8ddf4dd33 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -32,6 +32,11 @@ glue(glue(case INDEX_op_, x), _i32): \ glue(glue(case INDEX_op_, x), _i64) +#define CASE_OP_32_64_VEC(x) \ + glue(glue(case INDEX_op_, x), _i32): \ + glue(glue(case INDEX_op_, x), _i64): \ + glue(glue(case INDEX_op_, x), _vec) + struct tcg_temp_info { bool is_const; TCGTemp *prev_copy; @@ -108,40 +113,6 @@ static void init_arg_info(struct tcg_temp_info *infos, init_ts_info(infos, temps_used, arg_temp(arg)); } -static int op_bits(TCGOpcode op) -{ - const TCGOpDef *def = &tcg_op_defs[op]; - return def->flags & TCG_OPF_64BIT ? 64 : 32; -} - -static TCGOpcode op_to_mov(TCGOpcode op) -{ - switch (op_bits(op)) { - case 32: - return INDEX_op_mov_i32; - case 64: - return INDEX_op_mov_i64; - default: - fprintf(stderr, "op_to_mov: unexpected return value of " - "function op_bits.\n"); - tcg_abort(); - } -} - -static TCGOpcode op_to_movi(TCGOpcode op) -{ - switch (op_bits(op)) { - case 32: - return INDEX_op_movi_i32; - case 64: - return INDEX_op_movi_i64; - default: - fprintf(stderr, "op_to_movi: unexpected return value of " - "function op_bits.\n"); - tcg_abort(); - } -} - static TCGTemp *find_better_copy(TCGContext *s, TCGTemp *ts) { TCGTemp *i; @@ -199,11 +170,23 @@ static bool args_are_copies(TCGArg arg1, TCGArg arg2) static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg val) { - TCGOpcode new_op = op_to_movi(op->opc); + const TCGOpDef *def; + TCGOpcode new_op; tcg_target_ulong mask; struct tcg_temp_info *di = arg_info(dst); + def = &tcg_op_defs[op->opc]; + if (def->flags & TCG_OPF_VECTOR) { + new_op = INDEX_op_dupi_vec; + } else if (def->flags & TCG_OPF_64BIT) { + new_op = INDEX_op_movi_i64; + } else { + new_op = INDEX_op_movi_i32; + } op->opc = new_op; + /* TCGOP_VECL and TCGOP_VECE remain unchanged. */ + op->args[0] = dst; + op->args[1] = val; reset_temp(dst); di->is_const = true; @@ -214,15 +197,13 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg val) mask |= ~0xffffffffull; } di->mask = mask; - - op->args[0] = dst; - op->args[1] = val; } static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src) { TCGTemp *dst_ts = arg_temp(dst); TCGTemp *src_ts = arg_temp(src); + const TCGOpDef *def; struct tcg_temp_info *di; struct tcg_temp_info *si; tcg_target_ulong mask; @@ -236,9 +217,16 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src) reset_ts(dst_ts); di = ts_info(dst_ts); si = ts_info(src_ts); - new_op = op_to_mov(op->opc); - + def = &tcg_op_defs[op->opc]; + if (def->flags & TCG_OPF_VECTOR) { + new_op = INDEX_op_mov_vec; + } else if (def->flags & TCG_OPF_64BIT) { + new_op = INDEX_op_mov_i64; + } else { + new_op = INDEX_op_mov_i32; + } op->opc = new_op; + /* TCGOP_VECL and TCGOP_VECE remain unchanged. */ op->args[0] = dst; op->args[1] = src; @@ -417,8 +405,9 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y) static TCGArg do_constant_folding(TCGOpcode op, TCGArg x, TCGArg y) { + const TCGOpDef *def = &tcg_op_defs[op]; TCGArg res = do_constant_folding_2(op, x, y); - if (op_bits(op) == 32) { + if (!(def->flags & TCG_OPF_64BIT)) { res = (int32_t)res; } return res; @@ -508,13 +497,12 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x, tcg_target_ulong xv = arg_info(x)->val; tcg_target_ulong yv = arg_info(y)->val; if (arg_is_const(x) && arg_is_const(y)) { - switch (op_bits(op)) { - case 32: - return do_constant_folding_cond_32(xv, yv, c); - case 64: + const TCGOpDef *def = &tcg_op_defs[op]; + tcg_debug_assert(!(def->flags & TCG_OPF_VECTOR)); + if (def->flags & TCG_OPF_64BIT) { return do_constant_folding_cond_64(xv, yv, c); - default: - tcg_abort(); + } else { + return do_constant_folding_cond_32(xv, yv, c); } } else if (args_are_copies(x, y)) { return do_constant_folding_cond_eq(c); @@ -653,11 +641,11 @@ void tcg_optimize(TCGContext *s) /* For commutative operations make constant second argument */ switch (opc) { - CASE_OP_32_64(add): - CASE_OP_32_64(mul): - CASE_OP_32_64(and): - CASE_OP_32_64(or): - CASE_OP_32_64(xor): + CASE_OP_32_64_VEC(add): + CASE_OP_32_64_VEC(mul): + CASE_OP_32_64_VEC(and): + CASE_OP_32_64_VEC(or): + CASE_OP_32_64_VEC(xor): CASE_OP_32_64(eqv): CASE_OP_32_64(nand): CASE_OP_32_64(nor): @@ -722,7 +710,7 @@ void tcg_optimize(TCGContext *s) continue; } break; - CASE_OP_32_64(sub): + CASE_OP_32_64_VEC(sub): { TCGOpcode neg_op; bool have_neg; @@ -734,9 +722,12 @@ void tcg_optimize(TCGContext *s) if (opc == INDEX_op_sub_i32) { neg_op = INDEX_op_neg_i32; have_neg = TCG_TARGET_HAS_neg_i32; - } else { + } else if (opc == INDEX_op_sub_i64) { neg_op = INDEX_op_neg_i64; have_neg = TCG_TARGET_HAS_neg_i64; + } else { + neg_op = INDEX_op_neg_vec; + have_neg = TCG_TARGET_HAS_neg_vec; } if (!have_neg) { break; @@ -750,7 +741,7 @@ void tcg_optimize(TCGContext *s) } } break; - CASE_OP_32_64(xor): + CASE_OP_32_64_VEC(xor): CASE_OP_32_64(nand): if (!arg_is_const(op->args[1]) && arg_is_const(op->args[2]) @@ -767,7 +758,7 @@ void tcg_optimize(TCGContext *s) goto try_not; } break; - CASE_OP_32_64(andc): + CASE_OP_32_64_VEC(andc): if (!arg_is_const(op->args[2]) && arg_is_const(op->args[1]) && arg_info(op->args[1])->val == -1) { @@ -775,7 +766,7 @@ void tcg_optimize(TCGContext *s) goto try_not; } break; - CASE_OP_32_64(orc): + CASE_OP_32_64_VEC(orc): CASE_OP_32_64(eqv): if (!arg_is_const(op->args[2]) && arg_is_const(op->args[1]) @@ -789,7 +780,10 @@ void tcg_optimize(TCGContext *s) TCGOpcode not_op; bool have_not; - if (def->flags & TCG_OPF_64BIT) { + if (def->flags & TCG_OPF_VECTOR) { + not_op = INDEX_op_not_vec; + have_not = TCG_TARGET_HAS_not_vec; + } else if (def->flags & TCG_OPF_64BIT) { not_op = INDEX_op_not_i64; have_not = TCG_TARGET_HAS_not_i64; } else { @@ -810,16 +804,16 @@ void tcg_optimize(TCGContext *s) /* Simplify expression for "op r, a, const => mov r, a" cases */ switch (opc) { - CASE_OP_32_64(add): - CASE_OP_32_64(sub): + CASE_OP_32_64_VEC(add): + CASE_OP_32_64_VEC(sub): + CASE_OP_32_64_VEC(or): + CASE_OP_32_64_VEC(xor): + CASE_OP_32_64_VEC(andc): CASE_OP_32_64(shl): CASE_OP_32_64(shr): CASE_OP_32_64(sar): CASE_OP_32_64(rotl): CASE_OP_32_64(rotr): - CASE_OP_32_64(or): - CASE_OP_32_64(xor): - CASE_OP_32_64(andc): if (!arg_is_const(op->args[1]) && arg_is_const(op->args[2]) && arg_info(op->args[2])->val == 0) { @@ -827,8 +821,8 @@ void tcg_optimize(TCGContext *s) continue; } break; - CASE_OP_32_64(and): - CASE_OP_32_64(orc): + CASE_OP_32_64_VEC(and): + CASE_OP_32_64_VEC(orc): CASE_OP_32_64(eqv): if (!arg_is_const(op->args[1]) && arg_is_const(op->args[2]) @@ -1042,8 +1036,8 @@ void tcg_optimize(TCGContext *s) /* Simplify expression for "op r, a, 0 => movi r, 0" cases */ switch (opc) { - CASE_OP_32_64(and): - CASE_OP_32_64(mul): + CASE_OP_32_64_VEC(and): + CASE_OP_32_64_VEC(mul): CASE_OP_32_64(muluh): CASE_OP_32_64(mulsh): if (arg_is_const(op->args[2]) @@ -1058,8 +1052,8 @@ void tcg_optimize(TCGContext *s) /* Simplify expression for "op r, a, a => mov r, a" cases */ switch (opc) { - CASE_OP_32_64(or): - CASE_OP_32_64(and): + CASE_OP_32_64_VEC(or): + CASE_OP_32_64_VEC(and): if (args_are_copies(op->args[1], op->args[2])) { tcg_opt_gen_mov(s, op, op->args[0], op->args[1]); continue; @@ -1071,9 +1065,9 @@ void tcg_optimize(TCGContext *s) /* Simplify expression for "op r, a, a => movi r, 0" cases */ switch (opc) { - CASE_OP_32_64(andc): - CASE_OP_32_64(sub): - CASE_OP_32_64(xor): + CASE_OP_32_64_VEC(andc): + CASE_OP_32_64_VEC(sub): + CASE_OP_32_64_VEC(xor): if (args_are_copies(op->args[1], op->args[2])) { tcg_opt_gen_movi(s, op, op->args[0], 0); continue; @@ -1087,10 +1081,11 @@ void tcg_optimize(TCGContext *s) folding. Constants will be substituted to arguments by register allocator where needed and possible. Also detect copies. */ switch (opc) { - CASE_OP_32_64(mov): + CASE_OP_32_64_VEC(mov): tcg_opt_gen_mov(s, op, op->args[0], op->args[1]); break; CASE_OP_32_64(movi): + case INDEX_op_dupi_vec: tcg_opt_gen_movi(s, op, op->args[0], op->args[1]); break; From patchwork Mon Dec 18 17:17:56 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122269 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3127052qgn; Mon, 18 Dec 2017 09:45:53 -0800 (PST) X-Google-Smtp-Source: ACJfBovxwWCTQ7vcJTkszI8mfrV/TbTLEmQ79au5gTBsuuYddLTnUQkknV2V05eA2eHWKPYyfKAb X-Received: by 10.129.172.13 with SMTP id k13mr416490ywh.275.1513619153063; Mon, 18 Dec 2017 09:45:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513619153; cv=none; d=google.com; s=arc-20160816; b=T4z3cBgoZSNffR+dpyD+NF7OdE7EhE9AkrDlFpnxkyq5d0Tw2R/yhZyXxdE8aK5pPI jY7uO7gQGL+XKNFH+aVM9QvYPBXzEcXlliTgZNTe0d8u9Gm6bgHGkshS7mJsCiBeD9Lh B5oULMQz2P3SpD0cvXq5I/W+qw3FiQn18YhcrzeWUqyJzWtMKOYH4J9sApFj87Bh9H/G TtB1teezZJMoAWMQ2geGwcC85gc7ebFVF/XQdHb27X64nVW7jk2shw2Cs70RzLe/f/eK 7KAw2wUqUM92gmf10dqADi/Z5AynllF/cd7jBeP2PJr4FoOaNJ0VXXIEdHdNu/f47CSX 9lXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=8i/JRKaTSpgOJSSRFkW5UKcEKW6ZwDPhYDe8Wqjfx8I=; b=oB7bZj7toQ6WoBTn1UguRy0PU7bf50nHxeTGNOmwroj4SVfiCVc6waubE74V1i1S+j IqNnn+dhakOW9JuwEMl74ryiI8loKsNS8jWl+oVB2P/+/bo4pAQpxcAixPYskd35B0JM eLPwmyHSaL3Gnk9pOCZ+R06f1Vw2D/wx8LDOBQJQJWgWNoCslQCmNG1kH84Xjw7Oyfeg N3TEBdzm0SpcAuVW0Qjz6XmY5tlp89aw6i8yvROd+Bm4GbTa4kGEBUyJzPHt45n0IDql 4j401T0vEWnfmsIuQbLaq827OMlT4U2rMqfjJ8JnnCyqUbxRa/YVLc8icb3E73FFEknv DgrQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=jZYNM72c; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id g11si2623581ybm.503.2017.12.18.09.45.52 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:45:53 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=jZYNM72c; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:59083 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzU8-0006cZ-HU for patch@linaro.org; Mon, 18 Dec 2017 12:45:52 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37594) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3q-0000iB-F6 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:46 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3o-0001Y2-L6 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:42 -0500 Received: from mail-pf0-x241.google.com ([2607:f8b0:400e:c00::241]:39324) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3o-0001X6-BO for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:40 -0500 Received: by mail-pf0-x241.google.com with SMTP id l24so9940974pfj.6 for ; Mon, 18 Dec 2017 09:18:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=8i/JRKaTSpgOJSSRFkW5UKcEKW6ZwDPhYDe8Wqjfx8I=; b=jZYNM72c+UJ2kJmkbJ/YQCAfFmAcrPcvY4k53BsqArSGKITfvRWdXrdzvHaPMXaCPL cUQ+nUloYBzR1GB2qWxRipDYmsq5DPBnORHE8nnx2TxGH9zY1e9CGOo65L5GHTOYK+6Z eZXpbIM4PR3nL3HtDMunHxXVDvlRsOsP3zyng= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=8i/JRKaTSpgOJSSRFkW5UKcEKW6ZwDPhYDe8Wqjfx8I=; b=GmSWftpjaKvDU1oQg6O7ovEtp1q5mVHozwzoOAi6UrU041vsQRUeAtXJoXw+B6MEDB qQVZVCOyopzVi8p/a7GBCcmQ9slEXIai+evk+5Anq2SqMeI3Xyo12iRaFul9Yggx33ye tVYLBAN4b22AKFpvv4ICGUwRyA0214qzE+KpcS9p9qNsUIpTOdtA8ItpYtERbGBDmoQy gqPxCuRMIuqdYU+3SBnh/2X3qzFHZx8ruX2HVvnr4Gr+LD7xq72H1FzYumNn7ipsy3d9 +VzVJjx/RFf/RfBlALgxHNJ/YCD6lRN8s3U8ghx8lc59RdYFToFyDl7mFEqKi3piRbyb hOvg== X-Gm-Message-State: AKGB3mLtjqab77itgGGKLhLaUKsT2E94lH57wHtgjFd1xJcTLZvwyOe0 wFtIpBJ2V25vZ3uPSYs/E1Z61qu+/fI= X-Received: by 10.98.57.131 with SMTP id u3mr404785pfj.7.1513617518894; Mon, 18 Dec 2017 09:18:38 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.37 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:37 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:56 -0800 Message-Id: <20171218171758.16964-25-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::241 Subject: [Qemu-devel] [PATCH v7 24/26] tcg: Add support for 4 operand vector ops X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- tcg/tcg-op-gvec.h | 45 +++++++++-- tcg/tcg-op-gvec.c | 226 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 266 insertions(+), 5 deletions(-) -- 2.14.3 diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h index 188c3368bd..91459b9c38 100644 --- a/tcg/tcg-op-gvec.h +++ b/tcg/tcg-op-gvec.h @@ -30,29 +30,43 @@ /* Expand a call to a gvec-style helper, with pointers to two vector operands, and a descriptor (see tcg-gvec-desc.h). */ -typedef void (gen_helper_gvec_2)(TCGv_ptr, TCGv_ptr, TCGv_i32); +typedef void gen_helper_gvec_2(TCGv_ptr, TCGv_ptr, TCGv_i32); void tcg_gen_gvec_2_ool(uint32_t dofs, uint32_t aofs, uint32_t oprsz, uint32_t maxsz, int32_t data, gen_helper_gvec_2 *fn); /* Similarly, passing an extra pointer (e.g. env or float_status). */ -typedef void (gen_helper_gvec_2_ptr)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); +typedef void gen_helper_gvec_2_ptr(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); void tcg_gen_gvec_2_ptr(uint32_t dofs, uint32_t aofs, TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz, int32_t data, gen_helper_gvec_2_ptr *fn); /* Similarly, with three vector operands. */ -typedef void (gen_helper_gvec_3)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); +typedef void gen_helper_gvec_3(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); void tcg_gen_gvec_3_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz, int32_t data, gen_helper_gvec_3 *fn); -typedef void (gen_helper_gvec_3_ptr)(TCGv_ptr, TCGv_ptr, TCGv_ptr, - TCGv_ptr, TCGv_i32); +/* Similarly, with four vector operands. */ +typedef void gen_helper_gvec_4(TCGv_ptr, TCGv_ptr, TCGv_ptr, + TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_4_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, uint32_t oprsz, uint32_t maxsz, + int32_t data, gen_helper_gvec_4 *fn); + +typedef void gen_helper_gvec_3_ptr(TCGv_ptr, TCGv_ptr, TCGv_ptr, + TCGv_ptr, TCGv_i32); void tcg_gen_gvec_3_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs, TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz, int32_t data, gen_helper_gvec_3_ptr *fn); +typedef void gen_helper_gvec_4_ptr(TCGv_ptr, TCGv_ptr, TCGv_ptr, + TCGv_ptr, TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_4_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, TCGv_ptr ptr, uint32_t oprsz, + uint32_t maxsz, int32_t data, + gen_helper_gvec_4_ptr *fn); + /* Expand a gvec operation. Either inline or out-of-line depending on the actual vector size and the operations supported by the host. */ typedef struct { @@ -114,12 +128,33 @@ typedef struct { bool load_dest; } GVecGen3; +typedef struct { + /* Expand inline as a 64-bit or 32-bit integer. + Only one of these will be non-NULL. */ + void (*fni8)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64); + void (*fni4)(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32); + /* Expand inline with a host vector type. */ + void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec, TCGv_vec); + /* Expand out-of-line helper w/descriptor. */ + gen_helper_gvec_4 *fno; + /* The opcode, if any, to which this corresponds. */ + TCGOpcode opc; + /* The data argument to the out-of-line helper. */ + uint32_t data; + /* The vector element size, if applicable. */ + uint8_t vece; + /* Prefer i64 to v64. */ + bool prefer_i64; +} GVecGen4; + void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, uint32_t opsz, uint32_t clsz, const GVecGen2 *); void tcg_gen_gvec_2i(uint32_t dofs, uint32_t aofs, uint32_t opsz, uint32_t clsz, unsigned c, const GVecGen2i *); void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t opsz, uint32_t clsz, const GVecGen3 *); +void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs, + uint32_t opsz, uint32_t clsz, const GVecGen4 *); /* Expand a specific vector operation. */ diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index 8ec1817727..5f58859a1e 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -55,6 +55,18 @@ static void check_overlap_3(uint32_t d, uint32_t a, uint32_t b, uint32_t s) check_overlap_2(a, b, s); } +/* Verify vector overlap rules for four operands. */ +static void check_overlap_4(uint32_t d, uint32_t a, uint32_t b, + uint32_t c, uint32_t s) +{ + check_overlap_2(d, a, s); + check_overlap_2(d, b, s); + check_overlap_2(d, c, s); + check_overlap_2(a, b, s); + check_overlap_2(a, c, s); + check_overlap_2(b, c, s); +} + /* Create a descriptor from components. */ uint32_t simd_desc(uint32_t oprsz, uint32_t maxsz, int32_t data) { @@ -118,6 +130,33 @@ void tcg_gen_gvec_3_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, tcg_temp_free_i32(desc); } +/* Generate a call to a gvec-style helper with four vector operands. */ +void tcg_gen_gvec_4_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, uint32_t oprsz, uint32_t maxsz, + int32_t data, gen_helper_gvec_4 *fn) +{ + TCGv_ptr a0, a1, a2, a3; + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 = tcg_temp_new_ptr(); + a1 = tcg_temp_new_ptr(); + a2 = tcg_temp_new_ptr(); + a3 = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + tcg_gen_addi_ptr(a1, cpu_env, aofs); + tcg_gen_addi_ptr(a2, cpu_env, bofs); + tcg_gen_addi_ptr(a3, cpu_env, cofs); + + fn(a0, a1, a2, a3, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_ptr(a2); + tcg_temp_free_ptr(a3); + tcg_temp_free_i32(desc); +} + /* Generate a call to a gvec-style helper with three vector operands and an extra pointer operand. */ void tcg_gen_gvec_2_ptr(uint32_t dofs, uint32_t aofs, @@ -165,6 +204,35 @@ void tcg_gen_gvec_3_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs, tcg_temp_free_i32(desc); } +/* Generate a call to a gvec-style helper with four vector operands + and an extra pointer operand. */ +void tcg_gen_gvec_4_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, TCGv_ptr ptr, uint32_t oprsz, + uint32_t maxsz, int32_t data, + gen_helper_gvec_4_ptr *fn) +{ + TCGv_ptr a0, a1, a2, a3; + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 = tcg_temp_new_ptr(); + a1 = tcg_temp_new_ptr(); + a2 = tcg_temp_new_ptr(); + a3 = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + tcg_gen_addi_ptr(a1, cpu_env, aofs); + tcg_gen_addi_ptr(a2, cpu_env, bofs); + tcg_gen_addi_ptr(a3, cpu_env, cofs); + + fn(a0, a1, a2, a3, ptr, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_ptr(a2); + tcg_temp_free_ptr(a3); + tcg_temp_free_i32(desc); +} + /* Return true if we want to implement something of OPRSZ bytes in units of LNSZ. This limits the expansion of inline code. */ static inline bool check_size_impl(uint32_t oprsz, uint32_t lnsz) @@ -468,6 +536,30 @@ static void expand_3_i32(uint32_t dofs, uint32_t aofs, tcg_temp_free_i32(t0); } +/* Expand OPSZ bytes worth of three-operand operations using i32 elements. */ +static void expand_4_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, uint32_t opsz, + void (*fni)(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32)) +{ + TCGv_i32 t0 = tcg_temp_new_i32(); + TCGv_i32 t1 = tcg_temp_new_i32(); + TCGv_i32 t2 = tcg_temp_new_i32(); + TCGv_i32 t3 = tcg_temp_new_i32(); + uint32_t i; + + for (i = 0; i < opsz; i += 4) { + tcg_gen_ld_i32(t1, cpu_env, aofs + i); + tcg_gen_ld_i32(t2, cpu_env, bofs + i); + tcg_gen_ld_i32(t3, cpu_env, cofs + i); + fni(t0, t1, t2, t3); + tcg_gen_st_i32(t0, cpu_env, dofs + i); + } + tcg_temp_free_i32(t3); + tcg_temp_free_i32(t2); + tcg_temp_free_i32(t1); + tcg_temp_free_i32(t0); +} + /* Expand OPSZ bytes worth of two-operand operations using i64 elements. */ static void expand_2_i64(uint32_t dofs, uint32_t aofs, uint32_t opsz, void (*fni)(TCGv_i64, TCGv_i64)) @@ -527,6 +619,30 @@ static void expand_3_i64(uint32_t dofs, uint32_t aofs, tcg_temp_free_i64(t0); } +/* Expand OPSZ bytes worth of three-operand operations using i64 elements. */ +static void expand_4_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, uint32_t opsz, + void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64)) +{ + TCGv_i64 t0 = tcg_temp_new_i64(); + TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 t2 = tcg_temp_new_i64(); + TCGv_i64 t3 = tcg_temp_new_i64(); + uint32_t i; + + for (i = 0; i < opsz; i += 4) { + tcg_gen_ld_i64(t1, cpu_env, aofs + i); + tcg_gen_ld_i64(t2, cpu_env, bofs + i); + tcg_gen_ld_i64(t3, cpu_env, cofs + i); + fni(t0, t1, t2, t3); + tcg_gen_st_i64(t0, cpu_env, dofs + i); + } + tcg_temp_free_i64(t3); + tcg_temp_free_i64(t2); + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t0); +} + /* Expand OPSZ bytes worth of two-operand operations using host vectors. */ static void expand_2_vec(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t opsz, uint32_t tysz, TCGType type, @@ -591,6 +707,32 @@ static void expand_3_vec(unsigned vece, uint32_t dofs, uint32_t aofs, tcg_temp_free_vec(t0); } +/* Expand OPSZ bytes worth of four-operand operations using host vectors. */ +static void expand_4_vec(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t cofs, uint32_t opsz, + uint32_t tysz, TCGType type, + void (*fni)(unsigned, TCGv_vec, TCGv_vec, + TCGv_vec, TCGv_vec)) +{ + TCGv_vec t0 = tcg_temp_new_vec(type); + TCGv_vec t1 = tcg_temp_new_vec(type); + TCGv_vec t2 = tcg_temp_new_vec(type); + TCGv_vec t3 = tcg_temp_new_vec(type); + uint32_t i; + + for (i = 0; i < opsz; i += tysz) { + tcg_gen_ld_vec(t1, cpu_env, aofs + i); + tcg_gen_ld_vec(t2, cpu_env, bofs + i); + tcg_gen_ld_vec(t3, cpu_env, cofs + i); + fni(vece, t0, t1, t2, t3); + tcg_gen_st_vec(t0, cpu_env, dofs + i); + } + tcg_temp_free_vec(t3); + tcg_temp_free_vec(t2); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t0); +} + /* Expand a vector two-operand operation. */ void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, uint32_t oprsz, uint32_t maxsz, const GVecGen2 *g) @@ -831,6 +973,90 @@ void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, g->data, g->fno); } +/* Expand a vector four-operand operation. */ +void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs, + uint32_t oprsz, uint32_t maxsz, const GVecGen4 *g) +{ + check_size_align(oprsz, maxsz, dofs | aofs | bofs | cofs); + check_overlap_4(dofs, aofs, bofs, cofs, maxsz); + + /* Quick check for sizes we won't support inline. */ + if (oprsz > MAX_UNROLL * 32 || maxsz > MAX_UNROLL * 32) { + goto do_ool; + } + + /* Recall that ARM SVE allows vector sizes that are not a power of 2. + Expand with successively smaller host vector sizes. The intent is + that e.g. oprsz == 80 would be expanded with 2x32 + 1x16. */ + /* ??? For maxsz > oprsz, the host may be able to use an op-sized + operation, zeroing the balance of the register. We can then + use a cl-sized store to implement the clearing without an extra + store operation. This is true for aarch64 and x86_64 hosts. */ + + if (TCG_TARGET_HAS_v256 && check_size_impl(oprsz, 32) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V256, g->vece))) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 32); + expand_4_vec(g->vece, dofs, aofs, bofs, cofs, done, + 32, TCG_TYPE_V256, g->fniv); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (TCG_TARGET_HAS_v128 && check_size_impl(oprsz, 16) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V128, g->vece))) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 16); + expand_4_vec(g->vece, dofs, aofs, bofs, cofs, done, + 16, TCG_TYPE_V128, g->fniv); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (check_size_impl(oprsz, 8)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 8); + if (TCG_TARGET_HAS_v64 && !g->prefer_i64 + && (!g->opc + || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V64, g->vece))) { + expand_4_vec(g->vece, dofs, aofs, bofs, cofs, done, + 8, TCG_TYPE_V64, g->fniv); + } else if (g->fni8) { + expand_4_i64(dofs, aofs, bofs, cofs, done, g->fni8); + } else { + done = 0; + } + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (g->fni4 && check_size_impl(oprsz, 4)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 4); + expand_4_i32(dofs, aofs, bofs, cofs, done, g->fni4); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (oprsz == 0) { + if (maxsz != 0) { + expand_clr(dofs, maxsz); + } + return; + } + + do_ool: + tcg_gen_gvec_4_ool(dofs, aofs, bofs, cofs, oprsz, maxsz, g->data, g->fno); +} + /* * Expand specific vector operations. */ From patchwork Mon Dec 18 17:17:57 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122273 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3130299qgn; Mon, 18 Dec 2017 09:49:18 -0800 (PST) X-Google-Smtp-Source: ACJfBov98bytDLXt/zB3/CNPDHQ6R8NuI/YzNAMl9xqp/FTd01jNk5Fd+RRtGADYcvUyX/w8VKwM X-Received: by 10.37.234.1 with SMTP id p1mr448992ybd.499.1513619358472; Mon, 18 Dec 2017 09:49:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513619358; cv=none; d=google.com; s=arc-20160816; b=o/wBYX/3SC5QJrA8zRv+VEKze0ggwWw6TFAWJbXqBztHRmHZZ/SRswIvjh+qDdO7Ey PCn8S/wzzXtXhGZAnoKcOkNM163l7yQ9ksKhVd8YUexs4zJQlV0z23Ter/p/9tYBb7jO 41xyjnKXnVc+Ei2Tbq28wyAY9hr8dy/RT2DmuxufxbEfKbOwwanulTQkXkXbUhllEihN V6jN9UAPbMq/E4/NHCZdwVzIsMUd/3yR4+D9rDmMD5WU/Hdnzh5tunPFUMVsnzMnmRFc HyqrhGdqN9Ts8r6lj1sJ+5Ka2cXSZyzUQTl3ic/Y9AjnJ+WZGNOOYs3aQd5/AATBmnF4 8VbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=H3R3QZyLKQHIEEeJRfAS1UU+95fBYiGrwbSpJXiXZnk=; b=z/b5uMni0bCYgnQc83Hg6Wf8PdVQDW02UWx39jaCW7IVjizUjlbZIxxL0uIHYQYQJ+ oyBqZ99NkvyGq6UWRcsF7xrQrf7bS8PN1X2MpC+fuXL97x2zKj2InLlibhu4TNxUvgYF yExNg7oCdbS3hltPBQvHZEr20qg/FeoVWkgXEDRzbNfCv3TkHo8RnFtYsWFo2mUPlRD6 sD+vUd2l1RbQJ2H8T5NmR1boIzkRoTRkPTqwU4FcQ8GWazmgXjJMbNjv+uKSdGIUywcM LRK54QW3KJyb3h+xUbmQaswpT8w38nCF+6L67l2MjwLY7xRAKRX0a+L3ZZxcwCd4RhEI p/SA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=Cre1t1mT; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id d184si2575485ybc.795.2017.12.18.09.49.18 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:49:18 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=Cre1t1mT; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:59110 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzXS-0002Oy-1M for patch@linaro.org; Mon, 18 Dec 2017 12:49:18 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37629) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3u-0000le-3p for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:47 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3q-0001a6-0E for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:46 -0500 Received: from mail-pl0-x241.google.com ([2607:f8b0:400e:c01::241]:34426) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3p-0001Z7-R2 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:41 -0500 Received: by mail-pl0-x241.google.com with SMTP id d21so5190272pll.1 for ; Mon, 18 Dec 2017 09:18:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=H3R3QZyLKQHIEEeJRfAS1UU+95fBYiGrwbSpJXiXZnk=; b=Cre1t1mTT8kxCsX53uWUcAfdM2foUDqpBCPEoaUaKcG1s1u2J3NRvhYBnxMc+ZrD2v jYTkIs3ywkBaK6Ryn18AEdAJbhK/j1Yhv7TTwIRtqEyFos7y4bqLHcqlNVi2FZBhSBL4 spQcIX+87pqtwzR7XNTfIMjsCOcJ6SN5Che0U= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=H3R3QZyLKQHIEEeJRfAS1UU+95fBYiGrwbSpJXiXZnk=; b=fvJbyJJDF9011tC1bnwf18i4M+A3hru6iQY4EGl4LdqZeV2j46csfsV2qdTCxWjgCF vtNPmH0EzxJ87EA4DWV9PhlMqVVLiZ26P1lIo5HzZ7NrVXUKBD1ppEv+o7815nQpgOIR IYtuksPo52w5qT1R0OkCjq9isQxy7EFz3uJiwad1H983/WgCXVF0QGrkkiZgRwi8oTA2 5jcYcZQqPQbLBUQqGAo7BEDjYGhRwRCSnIkowUX0QvC81liuhUgRTZr6AEN2HvuTzAHw OMLn6wuaC365OQ7QHzSyDoYMeygHq3VF8qZDDGsmlGFeNOyd7XREdo0bHRnMFGn214S7 mVMA== X-Gm-Message-State: AKGB3mKAYf3n4ba4kZXlcxSVudtWSdv1EvUHaWgdqSdpG+w/XWuYD0XX uW2SIUhbTJKZHKdj7mlF3Gw0Pf+mDj8= X-Received: by 10.159.195.69 with SMTP id z5mr423369pln.180.1513617520537; Mon, 18 Dec 2017 09:18:40 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.38 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:39 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:57 -0800 Message-Id: <20171218171758.16964-26-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::241 Subject: [Qemu-devel] [PATCH v7 25/26] tcg: Add support for 5 operand vector ops X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- tcg/tcg-op-gvec.h | 7 +++++++ tcg/tcg-op-gvec.c | 30 ++++++++++++++++++++++++++++++ 2 files changed, 37 insertions(+) -- 2.14.3 diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h index 91459b9c38..3a927ce24d 100644 --- a/tcg/tcg-op-gvec.h +++ b/tcg/tcg-op-gvec.h @@ -54,6 +54,13 @@ void tcg_gen_gvec_4_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs, uint32_t oprsz, uint32_t maxsz, int32_t data, gen_helper_gvec_4 *fn); +/* Similarly, with five vector operands. */ +typedef void gen_helper_gvec_5(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr, + TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_5_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, uint32_t xofs, uint32_t oprsz, + uint32_t maxsz, int32_t data, gen_helper_gvec_5 *fn); + typedef void gen_helper_gvec_3_ptr(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); void tcg_gen_gvec_3_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs, diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index 5f58859a1e..a6df90b284 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -157,6 +157,36 @@ void tcg_gen_gvec_4_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, tcg_temp_free_i32(desc); } +/* Generate a call to a gvec-style helper with five vector operands. */ +void tcg_gen_gvec_5_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, uint32_t xofs, uint32_t oprsz, + uint32_t maxsz, int32_t data, gen_helper_gvec_5 *fn) +{ + TCGv_ptr a0, a1, a2, a3, a4; + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 = tcg_temp_new_ptr(); + a1 = tcg_temp_new_ptr(); + a2 = tcg_temp_new_ptr(); + a3 = tcg_temp_new_ptr(); + a4 = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + tcg_gen_addi_ptr(a1, cpu_env, aofs); + tcg_gen_addi_ptr(a2, cpu_env, bofs); + tcg_gen_addi_ptr(a3, cpu_env, cofs); + tcg_gen_addi_ptr(a4, cpu_env, xofs); + + fn(a0, a1, a2, a3, a4, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_ptr(a2); + tcg_temp_free_ptr(a3); + tcg_temp_free_ptr(a4); + tcg_temp_free_i32(desc); +} + /* Generate a call to a gvec-style helper with three vector operands and an extra pointer operand. */ void tcg_gen_gvec_2_ptr(uint32_t dofs, uint32_t aofs, From patchwork Mon Dec 18 17:17:58 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 122270 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3127789qgn; Mon, 18 Dec 2017 09:46:34 -0800 (PST) X-Google-Smtp-Source: ACJfBovDTLuSSpTYBVa2c+IpDLULvJOMoI8a5alw9YXDNhrpMkFV3lqTmX24aAsrY6BQlvznSOMC X-Received: by 10.129.51.72 with SMTP id z69mr446988ywz.390.1513619194892; Mon, 18 Dec 2017 09:46:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513619194; cv=none; d=google.com; s=arc-20160816; b=OxmvaGY5labLQshayrSEo8YTUXpZR2N0UgrP49Qcw8kyLBYQ1asllYYxtizFdOuzHy SXVWSYjo7xc1DwTAKYMBOMWJwzBwqwfACTZvjbJgqqTPPPx/saa5HR6j88XskE0S2YlE D5PAUOHeZDHi401h4Vz25CqMsy8oBtNLPsadev10mc78BP7XLOXAyc+O66yQ4o2/Kfd8 JFVX4AF5a7OW+7GNytXZykAVEV7Fe0oT0EukV1vNXE7ywv1Ev+uu21+k10kYkrEVrYag k9GGG14VLC0vBb+L7f3yND0rPQsF/UBcLmXIKkVWJZllZonBwLh3Ldd4nFPgTRrZ+Jfz RtyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=RkiWrDh1A4Eq7qdNT4gb8Zr8zGbgABfWRPg7U04keV0=; b=C5dTMIZPcAihBZ2bQxvLBO5Vj3TzAxosGUXiGux8XKFk5OvHYt00N4RV012l9MQRuP aSqp4ZoBxbEIrwoY5vOuD2HZ+fi2oVCEDWALBQ0uUqoKZVf6WnQZ0oj1ov0Y8UoLxdlu N2Nr+Z+YPfzkHEn7HJ2y8cktZgOwJ0NWtg4M3BJY1GHKNlG8OInRZn1TIvzQaC1h6fqO ULrMbi2S2p6g/Smta7P+yp8+oJ3c3xYEfKX0kC2DoznEw70BqWIJi963qlOXao52OasB a1cXPTKgoNIw13GdoAsPmZYuPKG1LvowSUlDJGFMUa3ftZTxX6D0RlCQSYQPspD7Pp9g 32Bg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=YIK7sHxy; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id h3si2751941ybi.790.2017.12.18.09.46.34 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 18 Dec 2017 09:46:34 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=YIK7sHxy; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:59097 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQzUo-00080Z-9V for patch@linaro.org; Mon, 18 Dec 2017 12:46:34 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37648) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eQz3w-0000rC-Q6 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:50 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eQz3r-0001cF-M8 for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:48 -0500 Received: from mail-pf0-x244.google.com ([2607:f8b0:400e:c00::244]:36778) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eQz3r-0001bD-CP for qemu-devel@nongnu.org; Mon, 18 Dec 2017 12:18:43 -0500 Received: by mail-pf0-x244.google.com with SMTP id p84so9944796pfd.3 for ; Mon, 18 Dec 2017 09:18:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=RkiWrDh1A4Eq7qdNT4gb8Zr8zGbgABfWRPg7U04keV0=; b=YIK7sHxyZX6ZO73QLRYYKnTH6zrn5OSHhprPjjABgT23nCKf1UeevIQmpLAhp6MwN2 Q7f4jM1cLdhlurmoqPu0+9GWxiKSCVJhZaow1ggqU0ETVb7j3yajdwdTuQzVcVANrKRh bgZAoqLVXpGdCqnK/AF9KmGv2Srhv9+DrY8K4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=RkiWrDh1A4Eq7qdNT4gb8Zr8zGbgABfWRPg7U04keV0=; b=U5wvMro/OcF2exZPT+szuOEXJomDLxI7pvwlt0XiSwn1fy5Rt5373PjSE4uKxPSJH3 GBel4XlWTgasl6I4qe7tHaHm3Hjl5h/aSuH+tkMNm0F1BNtOmFJbzD0xgbMwhrrwP3kT BTht02Q5i0go3X6FypQ9QDzywNXQcA4FPAtE+QStN3wKd8rV225YtMIyKTIYY2IPTjgO zj3/nzc35fO2ZyXUVj8wBV55CuFb2udDJCwZXJ7aYFCSKcqnMdHw3sPIdUH4SZWR6Ohu rl8ZPhAdFjZi3cFZQJMGleA6+Egj9sUzHmuaixHTupCCuVNbW/FOo5lXvZEmbBnvxgPD sZkQ== X-Gm-Message-State: AKGB3mJ/QOyvaFfSuYgZ8XH5vDJAYqcAWZb3CyEFAh6o6JPRYfp3n+d0 Ir2jeTodoZIpUUrPe5rXJtfFNhvSC0g= X-Received: by 10.98.192.202 with SMTP id g71mr405259pfk.33.1513617522003; Mon, 18 Dec 2017 09:18:42 -0800 (PST) Received: from cloudburst.twiddle.net (174-21-7-63.tukw.qwest.net. [174.21.7.63]) by smtp.gmail.com with ESMTPSA id y19sm21050272pgv.19.2017.12.18.09.18.40 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 18 Dec 2017 09:18:41 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:17:58 -0800 Message-Id: <20171218171758.16964-27-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171218171758.16964-1-richard.henderson@linaro.org> References: <20171218171758.16964-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::244 Subject: [Qemu-devel] [PATCH v7 26/26] tcg: Add generic helpers for saturating arithmetic X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" No vector ops as yet. SSE only has direct support for 8- and 16-bit saturation; handling 32- and 64-bit saturation is much more expensive. Signed-off-by: Richard Henderson --- accel/tcg/tcg-runtime.h | 20 ++++ tcg/tcg-op-gvec.h | 10 ++ accel/tcg/tcg-runtime-gvec.c | 268 +++++++++++++++++++++++++++++++++++++++++++ tcg/tcg-op-gvec.c | 92 +++++++++++++++ 4 files changed, 390 insertions(+) -- 2.14.3 diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h index d1b3542946..ec187a094b 100644 --- a/accel/tcg/tcg-runtime.h +++ b/accel/tcg/tcg-runtime.h @@ -157,6 +157,26 @@ DEF_HELPER_FLAGS_4(gvec_mul16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_mul32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_mul64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ssadd8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ssadd16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ssadd32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ssadd64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_sssub8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sssub16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sssub32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sssub64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_usadd8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_usadd16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_usadd32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_usadd64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_ussub8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ussub16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ussub32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ussub64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_3(gvec_neg8, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(gvec_neg16, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(gvec_neg32, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h index 3a927ce24d..64116d4291 100644 --- a/tcg/tcg-op-gvec.h +++ b/tcg/tcg-op-gvec.h @@ -179,6 +179,16 @@ void tcg_gen_gvec_sub(unsigned vece, uint32_t dofs, uint32_t aofs, void tcg_gen_gvec_mul(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t opsz, uint32_t clsz); +/* Saturated arithmetic. */ +void tcg_gen_gvec_ssadd(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_sssub(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_usadd(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_ussub(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); + void tcg_gen_gvec_and(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t opsz, uint32_t clsz); void tcg_gen_gvec_or(unsigned vece, uint32_t dofs, uint32_t aofs, diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c index ff26be0744..398ca54e1e 100644 --- a/accel/tcg/tcg-runtime-gvec.c +++ b/accel/tcg/tcg-runtime-gvec.c @@ -614,3 +614,271 @@ DO_EXT(gvec_extu32, uint32_t, uint64_t) DO_EXT(gvec_exts8, int8_t, int16_t) DO_EXT(gvec_exts16, int16_t, int32_t) DO_EXT(gvec_exts32, int32_t, int64_t) + +void HELPER(gvec_ssadd8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(int8_t)) { + int r = *(int8_t *)(a + i) + *(int8_t *)(b + i); + if (r > INT8_MAX) { + r = INT8_MAX; + } else if (r < INT8_MIN) { + r = INT8_MIN; + } + *(int8_t *)(d + i) = r; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_ssadd16)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(int16_t)) { + int r = *(int16_t *)(a + i) + *(int16_t *)(b + i); + if (r > INT16_MAX) { + r = INT16_MAX; + } else if (r < INT16_MIN) { + r = INT16_MIN; + } + *(int16_t *)(d + i) = r; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_ssadd32)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(int32_t)) { + int32_t ai = *(int32_t *)(a + i); + int32_t bi = *(int32_t *)(b + i); + int32_t di = ai + bi; + if (((di ^ ai) &~ (ai ^ bi)) < 0) { + /* Signed overflow. */ + di = (di < 0 ? INT32_MAX : INT32_MIN); + } + *(int32_t *)(d + i) = di; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_ssadd64)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(int64_t)) { + int64_t ai = *(int64_t *)(a + i); + int64_t bi = *(int64_t *)(b + i); + int64_t di = ai + bi; + if (((di ^ ai) &~ (ai ^ bi)) < 0) { + /* Signed overflow. */ + di = (di < 0 ? INT64_MAX : INT64_MIN); + } + *(int64_t *)(d + i) = di; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sssub8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(uint8_t)) { + int r = *(uint8_t *)(a + i) - *(uint8_t *)(b + i); + if (r > INT8_MAX) { + r = INT8_MAX; + } else if (r < INT8_MIN) { + r = INT8_MIN; + } + *(uint8_t *)(d + i) = r; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sssub16)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(int16_t)) { + int r = *(int16_t *)(a + i) - *(int16_t *)(b + i); + if (r > INT16_MAX) { + r = INT16_MAX; + } else if (r < INT16_MIN) { + r = INT16_MIN; + } + *(int16_t *)(d + i) = r; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sssub32)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(int32_t)) { + int32_t ai = *(int32_t *)(a + i); + int32_t bi = *(int32_t *)(b + i); + int32_t di = ai - bi; + if (((di ^ ai) & (ai ^ bi)) < 0) { + /* Signed overflow. */ + di = (di < 0 ? INT32_MAX : INT32_MIN); + } + *(int32_t *)(d + i) = di; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sssub64)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(int64_t)) { + int64_t ai = *(int64_t *)(a + i); + int64_t bi = *(int64_t *)(b + i); + int64_t di = ai - bi; + if (((di ^ ai) & (ai ^ bi)) < 0) { + /* Signed overflow. */ + di = (di < 0 ? INT64_MAX : INT64_MIN); + } + *(int64_t *)(d + i) = di; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_usadd8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(uint8_t)) { + unsigned r = *(uint8_t *)(a + i) + *(uint8_t *)(b + i); + if (r > UINT8_MAX) { + r = UINT8_MAX; + } + *(uint8_t *)(d + i) = r; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_usadd16)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(uint16_t)) { + unsigned r = *(uint16_t *)(a + i) + *(uint16_t *)(b + i); + if (r > UINT16_MAX) { + r = UINT16_MAX; + } + *(uint16_t *)(d + i) = r; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_usadd32)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(uint32_t)) { + uint32_t ai = *(uint32_t *)(a + i); + uint32_t bi = *(uint32_t *)(b + i); + uint32_t di = ai + bi; + if (di < ai) { + di = UINT32_MAX; + } + *(uint32_t *)(d + i) = di; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_usadd64)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(uint64_t)) { + uint64_t ai = *(uint64_t *)(a + i); + uint64_t bi = *(uint64_t *)(b + i); + uint64_t di = ai + bi; + if (di < ai) { + di = UINT64_MAX; + } + *(uint64_t *)(d + i) = di; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_ussub8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(uint8_t)) { + int r = *(uint8_t *)(a + i) - *(uint8_t *)(b + i); + if (r < 0) { + r = 0; + } + *(uint8_t *)(d + i) = r; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_ussub16)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(uint16_t)) { + int r = *(uint16_t *)(a + i) - *(uint16_t *)(b + i); + if (r < 0) { + r = 0; + } + *(uint16_t *)(d + i) = r; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_ussub32)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(uint32_t)) { + uint32_t ai = *(uint32_t *)(a + i); + uint32_t bi = *(uint32_t *)(b + i); + uint32_t di = ai - bi; + if (ai < bi) { + di = 0; + } + *(uint32_t *)(d + i) = di; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_ussub64)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(uint64_t)) { + uint64_t ai = *(uint64_t *)(a + i); + uint64_t bi = *(uint64_t *)(b + i); + uint64_t di = ai - bi; + if (ai < bi) { + di = 0; + } + *(uint64_t *)(d + i) = di; + } + clear_high(d, oprsz, desc); +} diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index a6df90b284..ddf0354106 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -1393,6 +1393,98 @@ void tcg_gen_gvec_mul(unsigned vece, uint32_t dofs, uint32_t aofs, tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g[vece]); } +void tcg_gen_gvec_ssadd(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen3 g[4] = { + { .fno = gen_helper_gvec_ssadd8, .vece = MO_8 }, + { .fno = gen_helper_gvec_ssadd16, .vece = MO_16 }, + { .fno = gen_helper_gvec_ssadd32, .vece = MO_32 }, + { .fno = gen_helper_gvec_ssadd64, .vece = MO_64 } + }; + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g[vece]); +} + +void tcg_gen_gvec_sssub(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen3 g[4] = { + { .fno = gen_helper_gvec_sssub8, .vece = MO_8 }, + { .fno = gen_helper_gvec_sssub16, .vece = MO_16 }, + { .fno = gen_helper_gvec_sssub32, .vece = MO_32 }, + { .fno = gen_helper_gvec_sssub64, .vece = MO_64 } + }; + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g[vece]); +} + +static void tcg_gen_vec_usadd32_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b) +{ + TCGv_i32 max = tcg_const_i32(-1); + tcg_gen_add_i32(d, a, b); + tcg_gen_movcond_i32(TCG_COND_LTU, d, d, a, max, d); + tcg_temp_free_i32(max); +} + +static void tcg_gen_vec_usadd32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 max = tcg_const_i64(-1); + tcg_gen_add_i64(d, a, b); + tcg_gen_movcond_i64(TCG_COND_LTU, d, d, a, max, d); + tcg_temp_free_i64(max); +} + +void tcg_gen_gvec_usadd(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen3 g[4] = { + { .fno = gen_helper_gvec_usadd8, .vece = MO_8 }, + { .fno = gen_helper_gvec_usadd16, .vece = MO_16 }, + { .fni4 = tcg_gen_vec_usadd32_i32, + .fno = gen_helper_gvec_usadd32, + .vece = MO_32 }, + { .fni8 = tcg_gen_vec_usadd32_i64, + .fno = gen_helper_gvec_usadd64, + .vece = MO_64 } + }; + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g[vece]); +} + +static void tcg_gen_vec_ussub32_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b) +{ + TCGv_i32 min = tcg_const_i32(0); + tcg_gen_sub_i32(d, a, b); + tcg_gen_movcond_i32(TCG_COND_LTU, d, a, b, min, d); + tcg_temp_free_i32(min); +} + +static void tcg_gen_vec_ussub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 min = tcg_const_i64(0); + tcg_gen_sub_i64(d, a, b); + tcg_gen_movcond_i64(TCG_COND_LTU, d, a, b, min, d); + tcg_temp_free_i64(min); +} + +void tcg_gen_gvec_ussub(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen3 g[4] = { + { .fno = gen_helper_gvec_ussub8, .vece = MO_8 }, + { .fno = gen_helper_gvec_ussub16, .vece = MO_16 }, + { .fni4 = tcg_gen_vec_ussub32_i32, + .fno = gen_helper_gvec_ussub32, + .vece = MO_32 }, + { .fni8 = tcg_gen_vec_ussub32_i64, + .fno = gen_helper_gvec_ussub64, + .vece = MO_64 } + }; + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g[vece]); +} + /* Perform a vector negation using normal negation and a mask. Compare gen_subv_mask above. */ static void gen_negv_mask(TCGv_i64 d, TCGv_i64 b, TCGv_i64 m)