From patchwork Sat Sep 16 02:34:12 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 112771 Delivered-To: patch@linaro.org Received: by 10.140.106.117 with SMTP id d108csp1291716qgf; Fri, 15 Sep 2017 19:37:41 -0700 (PDT) X-Received: by 10.237.60.50 with SMTP id t47mr29387792qte.49.1505529461311; Fri, 15 Sep 2017 19:37:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1505529461; cv=none; d=google.com; s=arc-20160816; b=SvPhpzoEwyOYR1yQUfgsOEF2l5SOkhaD+v6Xt0lOf8+p1SU6xPnU7oM3R9+KHmB7VZ 6gVtU+/D6FbOKWzd/uZaxIW0DWZGIA+PKWxirAu2c6SD8VGzT9ikpoLklDgSkfayuc1w hQ6SsBjxgcW8nwqi5W5y1aE9974mGHjaLnPFOy0d4YhKUskzEnmNR7Kox5T7FtOuHz+l ZoHyc028jZtl+EPK40h+AHdXSYoW15uqjhkxjmvgzOk8b2I8RpWnc3nhlAXxcrfu9uN4 SUEt37x1L1WbI2GqHfq69uv8JS7SVBPWHyisEmMkLz0yU6AoYpzioBRcB72mAz/M4JpT 5nvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=ZznQ0gHkgGVDzEOTGQtpN8pFjnkKlZk0Hr9SMuMW/yQ=; b=A+16IpFWWdp0QVP5b2g+KBInEB58UXblB8fnYBfNUS58wveaLbNMVDFsqGQiKn4EB3 qI623RVKE501shxYmw0FrGcI5nheWxf6YI3Dp2w/UGIkn4J67vxQUHApqs70jMwosyG6 dmF5jEFFaPEnZPSOCBm57uZ/y/ChyQx2e+yTy7MypqhHiVL5/68zmp9wv0LFwv8p/RIn tx7Bz7ud4s7jM3iKZTozaf0rFxb0AVtNIzEiLGpQXEsQUWYXGxd8ddANZmi0eN1JOCRa zj/9l2DoqSM8xX4Cpdfs/XWpspF60Hp+Uiwwc95aKwifczaVJj2HKHn99LAXyVgUFt78 FmiA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=S2GIyZxR; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id o15si2399560qtf.248.2017.09.15.19.37.40 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 15 Sep 2017 19:37:41 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=S2GIyZxR; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:55707 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dt2zD-00041A-0n for patch@linaro.org; Fri, 15 Sep 2017 22:37:39 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35709) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dt2w6-0001Sq-Tz for qemu-devel@nongnu.org; Fri, 15 Sep 2017 22:34:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dt2w3-0007iS-DP for qemu-devel@nongnu.org; Fri, 15 Sep 2017 22:34:26 -0400 Received: from mail-pg0-x236.google.com ([2607:f8b0:400e:c05::236]:53874) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dt2w3-0007iG-4E for qemu-devel@nongnu.org; Fri, 15 Sep 2017 22:34:23 -0400 Received: by mail-pg0-x236.google.com with SMTP id j70so2448889pgc.10 for ; Fri, 15 Sep 2017 19:34:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=ZznQ0gHkgGVDzEOTGQtpN8pFjnkKlZk0Hr9SMuMW/yQ=; b=S2GIyZxRTBHlHHlC7PDXSO/qSRKtyYeUDkPsKw64nvyR2ZcUv8rFmNf3sM8rSbTOWk EbUu7Y75PMjIOMmtM/cxVBD7vWJygTJ8kV1eN4NUag2OQL4PscpuMvVcUJnExxuXsKNw pwfmq84IkvM4xggbEpCzrJ1Qz2rg9ZJWynSV8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ZznQ0gHkgGVDzEOTGQtpN8pFjnkKlZk0Hr9SMuMW/yQ=; b=cZbIwYa+z1eKuWg44YIzN2oBY92w1XohOdP0eU0QGvz1RzUVDO7YXLZ5m8YI01FKMF 9DK40TuAcZ4Tje7sjwU25LZX4we6UHA8JUseTjo5QoW6IAuqam8sI2MA/Wi6b+AVbD9k nb2+z/qXOyotKTY9DeYundQwaU1fsAHYgKOJhnsXi1gZDFiudbIVKm2u35gxuIm6/WIz XC08gNtB64i+Qt4g34VRq/BXDt6EnOSHU4eXkUvCkDy9z9rMRdbpX5WLaea+Zdj3u61A Z32aHkTWEndbECBx5fm1fmy2mP7GfA8sBIpiWQgAisgXLdzPDy48Gf4cwV2rdXsS+k1A qZFg== X-Gm-Message-State: AHPjjUhB5uNtPcnD2dMSW9L9KZ4FcBSEutwDuNmeSCNWVH7Tkc3CJ7i7 S9RfR79VoVGdBZLgpLiAOw== X-Google-Smtp-Source: ADKCNb62IrIBUIXiuppaxDl2Jvf36O02AFt6IMZLLPkp7Un6ZxLI0Fjx2/R3C4kBiTfAi+J9+zAuMA== X-Received: by 10.99.95.131 with SMTP id t125mr26021513pgb.172.1505529261502; Fri, 15 Sep 2017 19:34:21 -0700 (PDT) Received: from bigtime.twiddle.net (97-126-103-167.tukw.qwest.net. [97.126.103.167]) by smtp.gmail.com with ESMTPSA id o5sm4443621pfh.67.2017.09.15.19.34.20 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 15 Sep 2017 19:34:20 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 15 Sep 2017 19:34:12 -0700 Message-Id: <20170916023417.14599-2-richard.henderson@linaro.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: <20170916023417.14599-1-richard.henderson@linaro.org> References: <20170916023417.14599-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::236 Subject: [Qemu-devel] [PATCH v3 1/6] tcg: Add types and operations for host vectors X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: alex.bennee@linaro.org, f4bug@amsat.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Nothing uses or enables them yet. Signed-off-by: Richard Henderson --- tcg/tcg-op.h | 26 +++++++ tcg/tcg-opc.h | 37 ++++++++++ tcg/tcg.h | 34 +++++++++ tcg/tcg-op.c | 234 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ tcg/tcg.c | 77 ++++++++++++++++++- tcg/README | 46 ++++++++++++ 6 files changed, 453 insertions(+), 1 deletion(-) -- 2.13.5 diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index 5d3278f243..b9b0b9f46f 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -915,6 +915,32 @@ void tcg_gen_atomic_or_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp); void tcg_gen_atomic_xor_fetch_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp); void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp); +void tcg_gen_mov_vec(TCGv_vec, TCGv_vec); +void tcg_gen_movi_vec(TCGv_vec, tcg_target_long); +void tcg_gen_add8_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_add16_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_add32_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_add64_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_sub8_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_sub16_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_sub32_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_sub64_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_and_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_or_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_xor_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_andc_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_orc_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_not_vec(TCGv_vec r, TCGv_vec a); +void tcg_gen_neg8_vec(TCGv_vec r, TCGv_vec a); +void tcg_gen_neg16_vec(TCGv_vec r, TCGv_vec a); +void tcg_gen_neg32_vec(TCGv_vec r, TCGv_vec a); +void tcg_gen_neg64_vec(TCGv_vec r, TCGv_vec a); + +void tcg_gen_ld_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset); +void tcg_gen_st_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset); +void tcg_gen_ldz_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType sz); +void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType sz); + #if TARGET_LONG_BITS == 64 #define tcg_gen_movi_tl tcg_gen_movi_i64 #define tcg_gen_mov_tl tcg_gen_mov_i64 diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index 956fb1e9f3..8200184fa9 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -204,8 +204,45 @@ DEF(qemu_ld_i64, DATA64_ARGS, TLADDR_ARGS, 1, DEF(qemu_st_i64, 0, TLADDR_ARGS + DATA64_ARGS, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS | TCG_OPF_64BIT) +/* Host vector support. */ + +#define IMPLVEC \ + IMPL(TCG_TARGET_HAS_v64 | TCG_TARGET_HAS_v128 | TCG_TARGET_HAS_v256) + +DEF(mov_vec, 1, 1, 1, TCG_OPF_NOT_PRESENT) + +/* ??? Simple, but perhaps dupiN would be more descriptive. */ +DEF(movi_vec, 1, 0, 2, TCG_OPF_NOT_PRESENT) + +DEF(ld_vec, 1, 1, 2, IMPLVEC) +DEF(ldz_vec, 1, 1, 3, IMPLVEC) +DEF(st_vec, 0, 2, 2, IMPLVEC) + +DEF(add8_vec, 1, 2, 1, IMPLVEC) +DEF(add16_vec, 1, 2, 1, IMPLVEC) +DEF(add32_vec, 1, 2, 1, IMPLVEC) +DEF(add64_vec, 1, 2, 1, IMPLVEC) + +DEF(sub8_vec, 1, 2, 1, IMPLVEC) +DEF(sub16_vec, 1, 2, 1, IMPLVEC) +DEF(sub32_vec, 1, 2, 1, IMPLVEC) +DEF(sub64_vec, 1, 2, 1, IMPLVEC) + +DEF(neg8_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_neg_vec)) +DEF(neg16_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_neg_vec)) +DEF(neg32_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_neg_vec)) +DEF(neg64_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_neg_vec)) + +DEF(and_vec, 1, 2, 1, IMPLVEC) +DEF(or_vec, 1, 2, 1, IMPLVEC) +DEF(xor_vec, 1, 2, 1, IMPLVEC) +DEF(andc_vec, 1, 2, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_andc_vec)) +DEF(orc_vec, 1, 2, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_orc_vec)) +DEF(not_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_not_vec)) + #undef TLADDR_ARGS #undef DATA64_ARGS #undef IMPL #undef IMPL64 +#undef IMPLVEC #undef DEF diff --git a/tcg/tcg.h b/tcg/tcg.h index 25662c36d4..7cd356e87f 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -173,6 +173,16 @@ typedef uint64_t TCGRegSet; # error "Missing unsigned widening multiply" #endif +#ifndef TCG_TARGET_HAS_v64 +#define TCG_TARGET_HAS_v64 0 +#define TCG_TARGET_HAS_v128 0 +#define TCG_TARGET_HAS_v256 0 +#define TCG_TARGET_HAS_neg_vec 0 +#define TCG_TARGET_HAS_not_vec 0 +#define TCG_TARGET_HAS_andc_vec 0 +#define TCG_TARGET_HAS_orc_vec 0 +#endif + #ifndef TARGET_INSN_START_EXTRA_WORDS # define TARGET_INSN_START_WORDS 1 #else @@ -249,6 +259,11 @@ typedef struct TCGPool { typedef enum TCGType { TCG_TYPE_I32, TCG_TYPE_I64, + + TCG_TYPE_V64, + TCG_TYPE_V128, + TCG_TYPE_V256, + TCG_TYPE_COUNT, /* number of different types */ /* An alias for the size of the host register. */ @@ -399,6 +414,8 @@ typedef tcg_target_ulong TCGArg; * TCGv_i32 : 32 bit integer type * TCGv_i64 : 64 bit integer type * TCGv_ptr : a host pointer type + * TCGv_vec : a host vector type; the exact size is not exposed + to the CPU front-end code. * TCGv : an integer type the same size as target_ulong (an alias for either TCGv_i32 or TCGv_i64) The compiler's type checking will complain if you mix them @@ -424,6 +441,7 @@ typedef tcg_target_ulong TCGArg; typedef struct TCGv_i32_d *TCGv_i32; typedef struct TCGv_i64_d *TCGv_i64; typedef struct TCGv_ptr_d *TCGv_ptr; +typedef struct TCGv_vec_d *TCGv_vec; typedef TCGv_ptr TCGv_env; #if TARGET_LONG_BITS == 32 #define TCGv TCGv_i32 @@ -448,6 +466,11 @@ static inline TCGv_ptr QEMU_ARTIFICIAL MAKE_TCGV_PTR(intptr_t i) return (TCGv_ptr)i; } +static inline TCGv_vec QEMU_ARTIFICIAL MAKE_TCGV_VEC(intptr_t i) +{ + return (TCGv_vec)i; +} + static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_I32(TCGv_i32 t) { return (intptr_t)t; @@ -463,6 +486,11 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t) return (intptr_t)t; } +static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_VEC(TCGv_vec t) +{ + return (intptr_t)t; +} + #if TCG_TARGET_REG_BITS == 32 #define TCGV_LOW(t) MAKE_TCGV_I32(GET_TCGV_I64(t)) #define TCGV_HIGH(t) MAKE_TCGV_I32(GET_TCGV_I64(t) + 1) @@ -471,15 +499,18 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t) #define TCGV_EQUAL_I32(a, b) (GET_TCGV_I32(a) == GET_TCGV_I32(b)) #define TCGV_EQUAL_I64(a, b) (GET_TCGV_I64(a) == GET_TCGV_I64(b)) #define TCGV_EQUAL_PTR(a, b) (GET_TCGV_PTR(a) == GET_TCGV_PTR(b)) +#define TCGV_EQUAL_VEC(a, b) (GET_TCGV_VEC(a) == GET_TCGV_VEC(b)) /* Dummy definition to avoid compiler warnings. */ #define TCGV_UNUSED_I32(x) x = MAKE_TCGV_I32(-1) #define TCGV_UNUSED_I64(x) x = MAKE_TCGV_I64(-1) #define TCGV_UNUSED_PTR(x) x = MAKE_TCGV_PTR(-1) +#define TCGV_UNUSED_VEC(x) x = MAKE_TCGV_VEC(-1) #define TCGV_IS_UNUSED_I32(x) (GET_TCGV_I32(x) == -1) #define TCGV_IS_UNUSED_I64(x) (GET_TCGV_I64(x) == -1) #define TCGV_IS_UNUSED_PTR(x) (GET_TCGV_PTR(x) == -1) +#define TCGV_IS_UNUSED_VEC(x) (GET_TCGV_VEC(x) == -1) /* call flags */ /* Helper does not read globals (either directly or through an exception). It @@ -790,9 +821,12 @@ TCGv_i64 tcg_global_reg_new_i64(TCGReg reg, const char *name); TCGv_i32 tcg_temp_new_internal_i32(int temp_local); TCGv_i64 tcg_temp_new_internal_i64(int temp_local); +TCGv_vec tcg_temp_new_vec(TCGType type); +TCGv_vec tcg_temp_new_vec_matching(TCGv_vec match); void tcg_temp_free_i32(TCGv_i32 arg); void tcg_temp_free_i64(TCGv_i64 arg); +void tcg_temp_free_vec(TCGv_vec arg); static inline TCGv_i32 tcg_global_mem_new_i32(TCGv_ptr reg, intptr_t offset, const char *name) diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c index 688d91755b..50b3177e5f 100644 --- a/tcg/tcg-op.c +++ b/tcg/tcg-op.c @@ -3072,3 +3072,237 @@ static void tcg_gen_mov2_i64(TCGv_i64 r, TCGv_i64 a, TCGv_i64 b) GEN_ATOMIC_HELPER(xchg, mov2, 0) #undef GEN_ATOMIC_HELPER + +static void tcg_gen_op2_vec(TCGOpcode opc, TCGv_vec r, TCGv_vec a) +{ + TCGArg ri = GET_TCGV_VEC(r); + TCGArg ai = GET_TCGV_VEC(a); + TCGTemp *rt = &tcg_ctx.temps[ri]; + TCGTemp *at = &tcg_ctx.temps[ai]; + TCGType type = rt->base_type; + + tcg_debug_assert(at->base_type == type); + tcg_gen_op3(&tcg_ctx, opc, ri, ai, type - TCG_TYPE_V64); +} + +static void tcg_gen_op3_vec(TCGOpcode opc, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + TCGArg ri = GET_TCGV_VEC(r); + TCGArg ai = GET_TCGV_VEC(a); + TCGArg bi = GET_TCGV_VEC(b); + TCGTemp *rt = &tcg_ctx.temps[ri]; + TCGTemp *at = &tcg_ctx.temps[ai]; + TCGTemp *bt = &tcg_ctx.temps[bi]; + TCGType type = rt->base_type; + + tcg_debug_assert(at->base_type == type); + tcg_debug_assert(bt->base_type == type); + tcg_gen_op4(&tcg_ctx, opc, ri, ai, bi, type - TCG_TYPE_V64); +} + +void tcg_gen_mov_vec(TCGv_vec r, TCGv_vec a) +{ + if (!TCGV_EQUAL_VEC(r, a)) { + tcg_gen_op2_vec(INDEX_op_mov_vec, r, a); + } +} + +void tcg_gen_movi_vec(TCGv_vec r, tcg_target_long a) +{ + TCGArg ri = GET_TCGV_VEC(r); + TCGTemp *rt = &tcg_ctx.temps[ri]; + TCGType type = rt->base_type; + + tcg_debug_assert(a == 0 || a == -1); + tcg_gen_op3(&tcg_ctx, INDEX_op_movi_vec, ri, a, type - TCG_TYPE_V64); +} + +void tcg_gen_ld_vec(TCGv_vec r, TCGv_ptr b, TCGArg o) +{ + TCGArg ri = GET_TCGV_VEC(r); + TCGArg bi = GET_TCGV_PTR(b); + TCGTemp *rt = &tcg_ctx.temps[ri]; + TCGType type = rt->base_type; + + tcg_gen_op4(&tcg_ctx, INDEX_op_ld_vec, ri, bi, o, type - TCG_TYPE_V64); +} + +void tcg_gen_st_vec(TCGv_vec r, TCGv_ptr b, TCGArg o) +{ + TCGArg ri = GET_TCGV_VEC(r); + TCGArg bi = GET_TCGV_PTR(b); + TCGTemp *rt = &tcg_ctx.temps[ri]; + TCGType type = rt->base_type; + + tcg_gen_op4(&tcg_ctx, INDEX_op_st_vec, ri, bi, o, type - TCG_TYPE_V64); +} + +/* Load data into a vector R from B+O using TYPE. If R is wider than TYPE, + fill the high bits with zeros. */ +void tcg_gen_ldz_vec(TCGv_vec r, TCGv_ptr b, TCGArg o, TCGType type) +{ + TCGArg ri = GET_TCGV_VEC(r); + TCGArg bi = GET_TCGV_PTR(b); + TCGTemp *rt = &tcg_ctx.temps[ri]; + TCGType btype = rt->base_type; + + if (type < btype) { + tcg_gen_op5(&tcg_ctx, INDEX_op_ldz_vec, ri, bi, o, + type - TCG_TYPE_V64, btype - TCG_TYPE_V64); + } else { + tcg_debug_assert(type == btype); + tcg_gen_op4(&tcg_ctx, INDEX_op_ld_vec, ri, bi, o, type - TCG_TYPE_V64); + } +} + +/* Store data from vector R into B+O using TYPE. If R is wider than TYPE, + store only the low bits. */ +void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr b, TCGArg o, TCGType type) +{ + TCGArg ri = GET_TCGV_VEC(r); + TCGArg bi = GET_TCGV_PTR(b); + TCGTemp *rt = &tcg_ctx.temps[ri]; + TCGType btype = rt->base_type; + + tcg_debug_assert(type <= btype); + tcg_gen_op4(&tcg_ctx, INDEX_op_st_vec, ri, bi, o, type - TCG_TYPE_V64); +} + +void tcg_gen_add8_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_op3_vec(INDEX_op_add8_vec, r, a, b); +} + +void tcg_gen_add16_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_op3_vec(INDEX_op_add16_vec, r, a, b); +} + +void tcg_gen_add32_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_op3_vec(INDEX_op_add32_vec, r, a, b); +} + +void tcg_gen_add64_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_op3_vec(INDEX_op_add64_vec, r, a, b); +} + +void tcg_gen_sub8_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_op3_vec(INDEX_op_sub8_vec, r, a, b); +} + +void tcg_gen_sub16_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_op3_vec(INDEX_op_sub16_vec, r, a, b); +} + +void tcg_gen_sub32_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_op3_vec(INDEX_op_sub32_vec, r, a, b); +} + +void tcg_gen_sub64_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_op3_vec(INDEX_op_sub64_vec, r, a, b); +} + +void tcg_gen_and_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_op3_vec(INDEX_op_and_vec, r, a, b); +} + +void tcg_gen_or_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_op3_vec(INDEX_op_or_vec, r, a, b); +} + +void tcg_gen_xor_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_op3_vec(INDEX_op_xor_vec, r, a, b); +} + +void tcg_gen_andc_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + if (TCG_TARGET_HAS_andc_vec) { + tcg_gen_op3_vec(INDEX_op_andc_vec, r, a, b); + } else { + TCGv_vec t = tcg_temp_new_vec_matching(r); + tcg_gen_not_vec(t, b); + tcg_gen_and_vec(r, a, t); + tcg_temp_free_vec(t); + } +} + +void tcg_gen_orc_vec(TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + if (TCG_TARGET_HAS_orc_vec) { + tcg_gen_op3_vec(INDEX_op_orc_vec, r, a, b); + } else { + TCGv_vec t = tcg_temp_new_vec_matching(r); + tcg_gen_not_vec(t, b); + tcg_gen_or_vec(r, a, t); + tcg_temp_free_vec(t); + } +} + +void tcg_gen_not_vec(TCGv_vec r, TCGv_vec a) +{ + if (TCG_TARGET_HAS_not_vec) { + tcg_gen_op2_vec(INDEX_op_orc_vec, r, a); + } else { + TCGv_vec t = tcg_temp_new_vec_matching(r); + tcg_gen_movi_vec(t, -1); + tcg_gen_xor_vec(r, a, t); + tcg_temp_free_vec(t); + } +} + +void tcg_gen_neg8_vec(TCGv_vec r, TCGv_vec a) +{ + if (TCG_TARGET_HAS_neg_vec) { + tcg_gen_op2_vec(INDEX_op_neg8_vec, r, a); + } else { + TCGv_vec t = tcg_temp_new_vec_matching(r); + tcg_gen_movi_vec(t, 0); + tcg_gen_sub8_vec(r, t, a); + tcg_temp_free_vec(t); + } +} + +void tcg_gen_neg16_vec(TCGv_vec r, TCGv_vec a) +{ + if (TCG_TARGET_HAS_neg_vec) { + tcg_gen_op2_vec(INDEX_op_neg16_vec, r, a); + } else { + TCGv_vec t = tcg_temp_new_vec_matching(r); + tcg_gen_movi_vec(t, 0); + tcg_gen_sub16_vec(r, t, a); + tcg_temp_free_vec(t); + } +} + +void tcg_gen_neg32_vec(TCGv_vec r, TCGv_vec a) +{ + if (TCG_TARGET_HAS_neg_vec) { + tcg_gen_op2_vec(INDEX_op_neg32_vec, r, a); + } else { + TCGv_vec t = tcg_temp_new_vec_matching(r); + tcg_gen_movi_vec(t, 0); + tcg_gen_sub32_vec(r, t, a); + tcg_temp_free_vec(t); + } +} + +void tcg_gen_neg64_vec(TCGv_vec r, TCGv_vec a) +{ + if (TCG_TARGET_HAS_neg_vec) { + tcg_gen_op2_vec(INDEX_op_neg64_vec, r, a); + } else { + TCGv_vec t = tcg_temp_new_vec_matching(r); + tcg_gen_movi_vec(t, 0); + tcg_gen_sub64_vec(r, t, a); + tcg_temp_free_vec(t); + } +} diff --git a/tcg/tcg.c b/tcg/tcg.c index dff9999bc6..a4d55efdf0 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -116,7 +116,7 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type, static bool tcg_out_ldst_finalize(TCGContext *s); #endif -static TCGRegSet tcg_target_available_regs[2]; +static TCGRegSet tcg_target_available_regs[TCG_TYPE_COUNT]; static TCGRegSet tcg_target_call_clobber_regs; #if TCG_TARGET_INSN_UNIT_SIZE == 1 @@ -664,6 +664,44 @@ TCGv_i64 tcg_temp_new_internal_i64(int temp_local) return MAKE_TCGV_I64(idx); } +TCGv_vec tcg_temp_new_vec(TCGType type) +{ + int idx; + +#ifdef CONFIG_DEBUG_TCG + switch (type) { + case TCG_TYPE_V64: + assert(TCG_TARGET_HAS_v64); + break; + case TCG_TYPE_V128: + assert(TCG_TARGET_HAS_v128); + break; + case TCG_TYPE_V256: + assert(TCG_TARGET_HAS_v256); + break; + default: + g_assert_not_reached(); + } +#endif + + idx = tcg_temp_new_internal(type, 0); + return MAKE_TCGV_VEC(idx); +} + +TCGv_vec tcg_temp_new_vec_matching(TCGv_vec match) +{ + TCGContext *s = &tcg_ctx; + int idx = GET_TCGV_VEC(match); + TCGTemp *ts; + + tcg_debug_assert(idx >= s->nb_globals && idx < s->nb_temps); + ts = &s->temps[idx]; + tcg_debug_assert(ts->temp_allocated != 0); + + idx = tcg_temp_new_internal(ts->base_type, 0); + return MAKE_TCGV_VEC(idx); +} + static void tcg_temp_free_internal(int idx) { TCGContext *s = &tcg_ctx; @@ -696,6 +734,11 @@ void tcg_temp_free_i64(TCGv_i64 arg) tcg_temp_free_internal(GET_TCGV_I64(arg)); } +void tcg_temp_free_vec(TCGv_vec arg) +{ + tcg_temp_free_internal(GET_TCGV_VEC(arg)); +} + TCGv_i32 tcg_const_i32(int32_t val) { TCGv_i32 t0; @@ -753,6 +796,9 @@ int tcg_check_temp_count(void) Test the runtime variable that controls each opcode. */ bool tcg_op_supported(TCGOpcode op) { + const bool have_vec + = TCG_TARGET_HAS_v64 | TCG_TARGET_HAS_v128 | TCG_TARGET_HAS_v256; + switch (op) { case INDEX_op_discard: case INDEX_op_set_label: @@ -966,6 +1012,35 @@ bool tcg_op_supported(TCGOpcode op) case INDEX_op_mulsh_i64: return TCG_TARGET_HAS_mulsh_i64; + case INDEX_op_mov_vec: + case INDEX_op_movi_vec: + case INDEX_op_ld_vec: + case INDEX_op_ldz_vec: + case INDEX_op_st_vec: + case INDEX_op_add8_vec: + case INDEX_op_add16_vec: + case INDEX_op_add32_vec: + case INDEX_op_add64_vec: + case INDEX_op_sub8_vec: + case INDEX_op_sub16_vec: + case INDEX_op_sub32_vec: + case INDEX_op_sub64_vec: + case INDEX_op_and_vec: + case INDEX_op_or_vec: + case INDEX_op_xor_vec: + return have_vec; + case INDEX_op_not_vec: + return have_vec && TCG_TARGET_HAS_not_vec; + case INDEX_op_neg8_vec: + case INDEX_op_neg16_vec: + case INDEX_op_neg32_vec: + case INDEX_op_neg64_vec: + return have_vec && TCG_TARGET_HAS_neg_vec; + case INDEX_op_andc_vec: + return have_vec && TCG_TARGET_HAS_andc_vec; + case INDEX_op_orc_vec: + return have_vec && TCG_TARGET_HAS_orc_vec; + case NB_OPS: break; } diff --git a/tcg/README b/tcg/README index 03bfb6acd4..3bf3af67db 100644 --- a/tcg/README +++ b/tcg/README @@ -503,6 +503,52 @@ of the memory access. For a 32-bit host, qemu_ld/st_i64 is guaranteed to only be used with a 64-bit memory access specified in flags. +********* Host vector operations + +All of the vector ops have a final constant argument that specifies the +length of the vector operation LEN as 64 << LEN bits. + +* mov_vec v0, v1, len +* ld_vec v0, t1, len +* st_vec v0, t1, len + + Move, load and store. + +* movi_vec v0, c, len + + Copy C across the entire vector. + At present the only supported values for C are 0 and -1. + +* add8_vec v0, v1, v2, len +* add16_vec v0, v1, v2, len +* add32_vec v0, v1, v2, len +* add64_vec v0, v1, v2, len + + v0 = v1 + v2, in elements of 8/16/32/64 bits, across len. + +* sub8_vec v0, v1, v2, len +* sub16_vec v0, v1, v2, len +* sub32_vec v0, v1, v2, len +* sub64_vec v0, v1, v2, len + + Similarly, v0 = v1 - v2. + +* neg8_vec v0, v1, len +* neg16_vec v0, v1, len +* neg32_vec v0, v1, len +* neg64_vec v0, v1, len + + Similarly, v0 = -v1. + +* and_vec v0, v1, v2, len +* or_vec v0, v1, v2, len +* xor_vec v0, v1, v2, len +* andc_vec v0, v1, v2, len +* orc_vec v0, v1, v2, len +* not_vec v0, v1, len + + Similarly, logical operations. + ********* Note 1: Some shortcuts are defined when the last operand is known to be From patchwork Sat Sep 16 02:34:13 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 112772 Delivered-To: patch@linaro.org Received: by 10.140.106.117 with SMTP id d108csp1292277qgf; Fri, 15 Sep 2017 19:38:38 -0700 (PDT) X-Received: by 10.200.24.56 with SMTP id q53mr38626979qtj.313.1505529518699; Fri, 15 Sep 2017 19:38:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1505529518; cv=none; d=google.com; s=arc-20160816; b=LAJhN2qSLEmR6bPKbeVg5UB7zuwmervFqNfywhEsGt2XQWYiPBQ4Vd/rEX1NX8bEfJ 0d0saLO0lV3eOAqpE/Tr6lJSfxp7em20y6FkGHyC7GIbEr33/adYHvFCz8EahkvntKaM 9ojpK1kXHlG62MZ27LZXhVCa6H6sH/D71wjZS/t4QWSqGZQHMP9sne8ROAtS9gGVcHsJ OBXQ3ZoWw+M8R0OEBWCi13Bt23/i0dBOM3EQhw7sd6FdIyclRCjYRP89OfBoUAEbWycW xYzhTQ1jVIawH7R2h0lTkOEFGw0xWo5N2tD5ODiPIvu9I3byqopXPCL25mJos/p2uz1Y BocQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=zoHez4od6djwYe6ThbNchRtOZ3pF9ehqtlXT4nfH5Tc=; b=w+0XQFnTLOLqCjLQV3OxOSkqOLTes/ggfyC0QFlXlc3NqgyHdm7DsERgUR0qEBAg0/ IvNEJ6B4R4NV1TYvbcOX8Vd/JV8lJopkl2zAd+nDMTAfEmB8DADYJp8P8UIlh+vNJmoH tnoDYWD+rTK2wEBK9uTmvxmqsDrjOVYsZcNcf32JpgEPeWB8oqm2vaVz88gXpUXVXP5O /k0jsfxVumzP/UmdLsS6hY59GOWFYre64v3HtEdzvyF74lrYFy3NQqH4tZCrFadA335g HSMnx8RV7y1j6b2R/IldOUeAhKrJNhRU/HiSUhWCIS7m4zMsQZ4z6+vom3S0FRUb/ZHd HtvQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=jzZGzuH7; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id v5si2184246qkd.451.2017.09.15.19.38.37 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 15 Sep 2017 19:38:38 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=jzZGzuH7; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:55709 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dt308-0004lF-4a for patch@linaro.org; Fri, 15 Sep 2017 22:38:36 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35741) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dt2w9-0001U0-Lk for qemu-devel@nongnu.org; Fri, 15 Sep 2017 22:34:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dt2w5-0007it-Ad for qemu-devel@nongnu.org; Fri, 15 Sep 2017 22:34:29 -0400 Received: from mail-pf0-x22d.google.com ([2607:f8b0:400e:c00::22d]:51175) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dt2w4-0007ib-UI for qemu-devel@nongnu.org; Fri, 15 Sep 2017 22:34:25 -0400 Received: by mail-pf0-x22d.google.com with SMTP id m63so2315177pfk.7 for ; Fri, 15 Sep 2017 19:34:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=zoHez4od6djwYe6ThbNchRtOZ3pF9ehqtlXT4nfH5Tc=; b=jzZGzuH7GLNK9O223rIg2H6YPC2vq8UbPq8HSKowQP423uwPyyWcHRKT3F1hYiDK92 g9xyTb6aDqEzV6gN5rkUNm/H9r6qKXc3w6dw/G0oMef2vv1EaREQR5AUXzukn+iYvG2u 6UgvCvFhcdIdh1mhHGmZjL6cyl03bMK/19Q4w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=zoHez4od6djwYe6ThbNchRtOZ3pF9ehqtlXT4nfH5Tc=; b=T6Fmg+gk2fpS6QQoes9nuT3zOkHsIyckX2S4EibhnXzAC/wFXlByxwkEt3w9C1bQkk rkiEYWKN54aZrkWS2iC/+SyN8o8Qgf6FHXENEBKDeiL8STx9hHcwIUfcgiNbQzJbfq4K zIx2UFGHLFbNO2QM/Un4HvLLtL5+iJKGnQLCsYYFjd+NpusTIbSD5F84Zd6H6/yyeYda VCZ1oshbdQqjJMaqekoPboHjNYPdnV0rzw2u/2C7Sd61WBHaJnGA8HrcOUzeOAFHIY9I 0w7Jlk+GuhmH+CYw2lHr9C7U0lPTAPqZEKtf/qGsOFKF/XM9kM7V9RCfzbWdsyON2eCF N/HA== X-Gm-Message-State: AHPjjUhTC559lYhLF0TyglyjJ0u99gv8yO0XKEFDmLf4QvWyFYy0+bdw 2t8mIKy9f3cDORWSpd3UnQ== X-Google-Smtp-Source: ADKCNb4YNPQjShLJp7knhNoG+uBj9ztl0iuD4T5V+p9pv0WD23Pfntz1xqrGHV19j+mdWyZCR4GRgg== X-Received: by 10.101.82.9 with SMTP id o9mr26774183pgp.42.1505529263039; Fri, 15 Sep 2017 19:34:23 -0700 (PDT) Received: from bigtime.twiddle.net (97-126-103-167.tukw.qwest.net. [97.126.103.167]) by smtp.gmail.com with ESMTPSA id o5sm4443621pfh.67.2017.09.15.19.34.21 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 15 Sep 2017 19:34:21 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 15 Sep 2017 19:34:13 -0700 Message-Id: <20170916023417.14599-3-richard.henderson@linaro.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: <20170916023417.14599-1-richard.henderson@linaro.org> References: <20170916023417.14599-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::22d Subject: [Qemu-devel] [PATCH v3 2/6] tcg: Add vector expanders X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: alex.bennee@linaro.org, f4bug@amsat.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- Makefile.target | 2 +- accel/tcg/tcg-runtime.h | 24 ++ tcg/tcg-gvec-desc.h | 49 +++ tcg/tcg-op-gvec.h | 143 ++++++++ accel/tcg/tcg-runtime-gvec.c | 255 +++++++++++++ tcg/tcg-op-gvec.c | 853 +++++++++++++++++++++++++++++++++++++++++++ accel/tcg/Makefile.objs | 2 +- 7 files changed, 1326 insertions(+), 2 deletions(-) create mode 100644 tcg/tcg-gvec-desc.h create mode 100644 tcg/tcg-op-gvec.h create mode 100644 accel/tcg/tcg-runtime-gvec.c create mode 100644 tcg/tcg-op-gvec.c -- 2.13.5 Reviewed-by: Alex Bennée diff --git a/Makefile.target b/Makefile.target index 6361f957fb..f9967feef5 100644 --- a/Makefile.target +++ b/Makefile.target @@ -94,7 +94,7 @@ all: $(PROGS) stap obj-y += exec.o obj-y += accel/ obj-$(CONFIG_TCG) += tcg/tcg.o tcg/tcg-op.o tcg/optimize.o -obj-$(CONFIG_TCG) += tcg/tcg-common.o +obj-$(CONFIG_TCG) += tcg/tcg-common.o tcg/tcg-op-gvec.o obj-$(CONFIG_TCG_INTERPRETER) += tcg/tci.o obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o obj-y += fpu/softfloat.o diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h index c41d38a557..61c0ce39d3 100644 --- a/accel/tcg/tcg-runtime.h +++ b/accel/tcg/tcg-runtime.h @@ -134,3 +134,27 @@ GEN_ATOMIC_HELPERS(xor_fetch) GEN_ATOMIC_HELPERS(xchg) #undef GEN_ATOMIC_HELPERS + +DEF_HELPER_FLAGS_3(gvec_mov, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_add8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_add16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_add32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_add64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_sub8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sub16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sub32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sub64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(gvec_neg8, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_neg16, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_neg32, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_neg64, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(gvec_not, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_and, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_or, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_xor, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_andc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_orc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/tcg/tcg-gvec-desc.h b/tcg/tcg-gvec-desc.h new file mode 100644 index 0000000000..8ba9a8168d --- /dev/null +++ b/tcg/tcg-gvec-desc.h @@ -0,0 +1,49 @@ +/* + * Generic vector operation descriptor + * + * Copyright (c) 2017 Linaro + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see . + */ + +/* ??? These bit widths are set for ARM SVE, maxing out at 256 byte vectors. */ +#define SIMD_OPRSZ_SHIFT 0 +#define SIMD_OPRSZ_BITS 5 + +#define SIMD_MAXSZ_SHIFT (SIMD_OPRSZ_SHIFT + SIMD_OPRSZ_BITS) +#define SIMD_MAXSZ_BITS 5 + +#define SIMD_DATA_SHIFT (SIMD_MAXSZ_SHIFT + SIMD_MAXSZ_BITS) +#define SIMD_DATA_BITS (32 - SIMD_DATA_SHIFT) + +/* Create a descriptor from components. */ +uint32_t simd_desc(uint32_t oprsz, uint32_t maxsz, int32_t data); + +/* Extract the operation size from a descriptor. */ +static inline intptr_t simd_oprsz(uint32_t desc) +{ + return (extract32(desc, SIMD_OPRSZ_SHIFT, SIMD_OPRSZ_BITS) + 1) * 8; +} + +/* Extract the max vector size from a descriptor. */ +static inline intptr_t simd_maxsz(uint32_t desc) +{ + return (extract32(desc, SIMD_MAXSZ_SHIFT, SIMD_MAXSZ_BITS) + 1) * 8; +} + +/* Extract the operation-specific data from a descriptor. */ +static inline int32_t simd_data(uint32_t desc) +{ + return sextract32(desc, SIMD_DATA_SHIFT, SIMD_DATA_BITS); +} diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h new file mode 100644 index 0000000000..28bd77f1dc --- /dev/null +++ b/tcg/tcg-op-gvec.h @@ -0,0 +1,143 @@ +/* + * Generic vector operation expansion + * + * Copyright (c) 2017 Linaro + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see . + */ + +/* + * "Generic" vectors. All operands are given as offsets from ENV, + * and therefore cannot also be allocated via tcg_global_mem_new_*. + * OPRSZ is the byte size of the vector upon which the operation is performed. + * MAXSZ is the byte size of the full vector; bytes beyond OPSZ are cleared. + * + * All sizes must be 8 or any multiple of 16. + * When OPRSZ is 8, the alignment may be 8, otherwise must be 16. + * Operands may completely, but not partially, overlap. + */ + +/* Expand a call to a gvec-style helper, with pointers to two vector + operands, and a descriptor (see tcg-gvec-desc.h). */ +typedef void (gen_helper_gvec_2)(TCGv_ptr, TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_2_ool(uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, int32_t data, + gen_helper_gvec_2 *fn); + +/* Similarly, passing an extra pointer (e.g. env or float_status). */ +typedef void (gen_helper_gvec_2_ptr)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_2_ptr(uint32_t dofs, uint32_t aofs, + TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz, + int32_t data, gen_helper_gvec_2_ptr *fn); + +/* Similarly, with three vector operands. */ +typedef void (gen_helper_gvec_3)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_3_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t oprsz, uint32_t maxsz, int32_t data, + gen_helper_gvec_3 *fn); + +typedef void (gen_helper_gvec_3_ptr)(TCGv_ptr, TCGv_ptr, TCGv_ptr, + TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_3_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs, + TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz, + int32_t data, gen_helper_gvec_3_ptr *fn); + +/* Expand a gvec operation. Either inline or out-of-line depending on + the actual vector size and the operations supported by the host. */ +typedef struct { + /* Expand inline as a 64-bit or 32-bit integer. + Only one of these will be non-NULL. */ + void (*fni8)(TCGv_i64, TCGv_i64); + void (*fni4)(TCGv_i32, TCGv_i32); + /* Expand inline with a host vector type. */ + void (*fniv)(TCGv_vec, TCGv_vec); + /* Expand out-of-line helper w/descriptor. */ + gen_helper_gvec_2 *fno; + /* Prefer i64 to v64. */ + bool prefer_i64; +} GVecGen2; + +typedef struct { + /* Expand inline as a 64-bit or 32-bit integer. + Only one of these will be non-NULL. */ + void (*fni8)(TCGv_i64, TCGv_i64, TCGv_i64); + void (*fni4)(TCGv_i32, TCGv_i32, TCGv_i32); + /* Expand inline with a host vector type. */ + void (*fniv)(TCGv_vec, TCGv_vec, TCGv_vec); + /* Expand out-of-line helper w/descriptor. */ + gen_helper_gvec_3 *fno; + /* Prefer i64 to v64. */ + bool prefer_i64; + /* Load dest as a 3rd source operand. */ + bool load_dest; +} GVecGen3; + +void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t clsz, const GVecGen2 *); +void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz, const GVecGen3 *); + +/* Expand a specific vector operation. */ + +#define DEF(X) \ + void tcg_gen_gvec_##X(uint32_t dofs, uint32_t aofs, \ + uint32_t opsz, uint32_t clsz) + +DEF(mov); +DEF(not); +DEF(neg8); +DEF(neg16); +DEF(neg32); +DEF(neg64); + +#undef DEF +#define DEF(X) \ + void tcg_gen_gvec_##X(uint32_t dofs, uint32_t aofs, uint32_t bofs, \ + uint32_t opsz, uint32_t clsz) + +DEF(add8); +DEF(add16); +DEF(add32); +DEF(add64); + +DEF(sub8); +DEF(sub16); +DEF(sub32); +DEF(sub64); + +DEF(and); +DEF(or); +DEF(xor); +DEF(andc); +DEF(orc); + +#undef DEF + +/* + * 64-bit vector operations. Use these when the register has been allocated + * with tcg_global_mem_new_i64, and so we cannot also address it via pointer. + * OPRSZ = MAXSZ = 8. + */ + +void tcg_gen_vec_neg8_i64(TCGv_i64 d, TCGv_i64 a); +void tcg_gen_vec_neg16_i64(TCGv_i64 d, TCGv_i64 a); +void tcg_gen_vec_neg32_i64(TCGv_i64 d, TCGv_i64 a); + +void tcg_gen_vec_add8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); +void tcg_gen_vec_add16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); +void tcg_gen_vec_add32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); + +void tcg_gen_vec_sub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); +void tcg_gen_vec_sub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); +void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c new file mode 100644 index 0000000000..c75e76367c --- /dev/null +++ b/accel/tcg/tcg-runtime-gvec.c @@ -0,0 +1,255 @@ +/* + * Generic vectorized operation runtime + * + * Copyright (c) 2017 Linaro + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see . + */ + +#include "qemu/osdep.h" +#include "qemu/host-utils.h" +#include "cpu.h" +#include "exec/helper-proto.h" +#include "tcg-gvec-desc.h" + + +/* Virtually all hosts support 16-byte vectors. Those that don't can emulate + them via GCC's generic vector extension. This turns out to be simpler and + more reliable than getting the compiler to autovectorize. + + In tcg-op-gvec.c, we asserted that both the size and alignment + of the data are multiples of 16. */ + +typedef uint8_t vec8 __attribute__((vector_size(16))); +typedef uint16_t vec16 __attribute__((vector_size(16))); +typedef uint32_t vec32 __attribute__((vector_size(16))); +typedef uint64_t vec64 __attribute__((vector_size(16))); + +static inline void clear_high(void *d, intptr_t oprsz, uint32_t desc) +{ + intptr_t maxsz = simd_maxsz(desc); + intptr_t i; + + if (unlikely(maxsz > oprsz)) { + for (i = oprsz; i < maxsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = (vec64){ 0 }; + } + } +} + +void HELPER(gvec_add8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(vec8 *)(d + i) = *(vec8 *)(a + i) + *(vec8 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_add16)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(vec16 *)(d + i) = *(vec16 *)(a + i) + *(vec16 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_add32)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(vec32 *)(d + i) = *(vec32 *)(a + i) + *(vec32 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_add64)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) + *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sub8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(vec8 *)(d + i) = *(vec8 *)(a + i) - *(vec8 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sub16)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(vec16 *)(d + i) = *(vec16 *)(a + i) - *(vec16 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sub32)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(vec32 *)(d + i) = *(vec32 *)(a + i) - *(vec32 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sub64)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) - *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_neg8)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(vec8 *)(d + i) = -*(vec8 *)(a + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_neg16)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(vec16 *)(d + i) = -*(vec16 *)(a + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_neg32)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(vec32 *)(d + i) = -*(vec32 *)(a + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_neg64)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = -*(vec64 *)(a + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_mov)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + + memcpy(d, a, oprsz); + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_not)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = ~*(vec64 *)(a + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_and)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) & *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_or)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) | *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_xor)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) ^ *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_andc)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) &~ *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_orc)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) |~ *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c new file mode 100644 index 0000000000..7464321eba --- /dev/null +++ b/tcg/tcg-op-gvec.c @@ -0,0 +1,853 @@ +/* + * Generic vector operation expansion + * + * Copyright (c) 2017 Linaro + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see . + */ + +#include "qemu/osdep.h" +#include "qemu-common.h" +#include "tcg.h" +#include "tcg-op.h" +#include "tcg-op-gvec.h" +#include "tcg-gvec-desc.h" + +#define REP8(x) ((x) * 0x0101010101010101ull) +#define REP16(x) ((x) * 0x0001000100010001ull) + +#define MAX_UNROLL 4 + +/* Verify vector size and alignment rules. OFS should be the OR of all + of the operand offsets so that we can check them all at once. */ +static void check_size_align(uint32_t oprsz, uint32_t maxsz, uint32_t ofs) +{ + uint32_t align = maxsz > 16 || oprsz >= 16 ? 15 : 7; + tcg_debug_assert(oprsz > 0); + tcg_debug_assert(oprsz <= maxsz); + tcg_debug_assert((oprsz & align) == 0); + tcg_debug_assert((maxsz & align) == 0); + tcg_debug_assert((ofs & align) == 0); +} + +/* Verify vector overlap rules for two operands. */ +static void check_overlap_2(uint32_t d, uint32_t a, uint32_t s) +{ + tcg_debug_assert(d == a || d + s <= a || a + s <= d); +} + +/* Verify vector overlap rules for three operands. */ +static void check_overlap_3(uint32_t d, uint32_t a, uint32_t b, uint32_t s) +{ + check_overlap_2(d, a, s); + check_overlap_2(d, b, s); + check_overlap_2(a, b, s); +} + +/* Create a descriptor from components. */ +uint32_t simd_desc(uint32_t oprsz, uint32_t maxsz, int32_t data) +{ + uint32_t desc = 0; + + assert(oprsz % 8 == 0 && oprsz <= (8 << SIMD_OPRSZ_BITS)); + assert(maxsz % 8 == 0 && maxsz <= (8 << SIMD_MAXSZ_BITS)); + assert(data == sextract32(data, 0, SIMD_DATA_BITS)); + + oprsz = (oprsz / 8) - 1; + maxsz = (maxsz / 8) - 1; + desc = deposit32(desc, SIMD_OPRSZ_SHIFT, SIMD_OPRSZ_BITS, oprsz); + desc = deposit32(desc, SIMD_MAXSZ_SHIFT, SIMD_MAXSZ_BITS, maxsz); + desc = deposit32(desc, SIMD_DATA_SHIFT, SIMD_DATA_BITS, data); + + return desc; +} + +/* Generate a call to a gvec-style helper with two vector operands. */ +void tcg_gen_gvec_2_ool(uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, int32_t data, + gen_helper_gvec_2 *fn) +{ + TCGv_ptr a0, a1; + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 = tcg_temp_new_ptr(); + a1 = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, tcg_ctx.tcg_env, dofs); + tcg_gen_addi_ptr(a1, tcg_ctx.tcg_env, aofs); + + fn(a0, a1, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_i32(desc); +} + +/* Generate a call to a gvec-style helper with three vector operands. */ +void tcg_gen_gvec_3_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t oprsz, uint32_t maxsz, int32_t data, + gen_helper_gvec_3 *fn) +{ + TCGv_ptr a0, a1, a2; + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 = tcg_temp_new_ptr(); + a1 = tcg_temp_new_ptr(); + a2 = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, tcg_ctx.tcg_env, dofs); + tcg_gen_addi_ptr(a1, tcg_ctx.tcg_env, aofs); + tcg_gen_addi_ptr(a2, tcg_ctx.tcg_env, bofs); + + fn(a0, a1, a2, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_ptr(a2); + tcg_temp_free_i32(desc); +} + +/* Generate a call to a gvec-style helper with three vector operands + and an extra pointer operand. */ +void tcg_gen_gvec_2_ptr(uint32_t dofs, uint32_t aofs, + TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz, + int32_t data, gen_helper_gvec_2_ptr *fn) +{ + TCGv_ptr a0, a1; + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 = tcg_temp_new_ptr(); + a1 = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, tcg_ctx.tcg_env, dofs); + tcg_gen_addi_ptr(a1, tcg_ctx.tcg_env, aofs); + + fn(a0, a1, ptr, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_i32(desc); +} + +/* Generate a call to a gvec-style helper with three vector operands + and an extra pointer operand. */ +void tcg_gen_gvec_3_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs, + TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz, + int32_t data, gen_helper_gvec_3_ptr *fn) +{ + TCGv_ptr a0, a1, a2; + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 = tcg_temp_new_ptr(); + a1 = tcg_temp_new_ptr(); + a2 = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, tcg_ctx.tcg_env, dofs); + tcg_gen_addi_ptr(a1, tcg_ctx.tcg_env, aofs); + tcg_gen_addi_ptr(a2, tcg_ctx.tcg_env, bofs); + + fn(a0, a1, a2, ptr, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_ptr(a2); + tcg_temp_free_i32(desc); +} + +/* Return true if we want to implement something of OPRSZ bytes + in units of LNSZ. This limits the expansion of inline code. */ +static inline bool check_size_impl(uint32_t oprsz, uint32_t lnsz) +{ + uint32_t lnct = oprsz / lnsz; + return lnct >= 1 && lnct <= MAX_UNROLL; +} + +/* Clear MAXSZ bytes at DOFS. */ +static void expand_clr(uint32_t dofs, uint32_t maxsz) +{ + if (maxsz >= 16 && TCG_TARGET_HAS_v128) { + TCGv_vec zero; + + if (maxsz >= 32 && TCG_TARGET_HAS_v256) { + zero = tcg_temp_new_vec(TCG_TYPE_V256); + tcg_gen_movi_vec(zero, 0); + + for (; maxsz >= 32; dofs += 32, maxsz -= 32) { + tcg_gen_stl_vec(zero, tcg_ctx.tcg_env, dofs, TCG_TYPE_V256); + } + } else { + zero = tcg_temp_new_vec(TCG_TYPE_V128); + tcg_gen_movi_vec(zero, 0); + } + for (; maxsz >= 16; dofs += 16, maxsz -= 16) { + tcg_gen_stl_vec(zero, tcg_ctx.tcg_env, dofs, TCG_TYPE_V128); + } + + tcg_temp_free_vec(zero); + } if (TCG_TARGET_REG_BITS == 64) { + TCGv_i64 zero = tcg_const_i64(0); + + for (; maxsz >= 8; dofs += 8, maxsz -= 8) { + tcg_gen_st_i64(zero, tcg_ctx.tcg_env, dofs); + } + + tcg_temp_free_i64(zero); + } else if (TCG_TARGET_HAS_v64) { + TCGv_vec zero = tcg_temp_new_vec(TCG_TYPE_V64); + + tcg_gen_movi_vec(zero, 0); + for (; maxsz >= 8; dofs += 8, maxsz -= 8) { + tcg_gen_st_vec(zero, tcg_ctx.tcg_env, dofs); + } + + tcg_temp_free_vec(zero); + } else { + TCGv_i32 zero = tcg_const_i32(0); + + for (; maxsz >= 4; dofs += 4, maxsz -= 4) { + tcg_gen_st_i32(zero, tcg_ctx.tcg_env, dofs); + } + + tcg_temp_free_i32(zero); + } +} + +/* Expand OPSZ bytes worth of two-operand operations using i32 elements. */ +static void expand_2_i32(uint32_t dofs, uint32_t aofs, uint32_t opsz, + void (*fni)(TCGv_i32, TCGv_i32)) +{ + TCGv_i32 t0 = tcg_temp_new_i32(); + uint32_t i; + + for (i = 0; i < opsz; i += 4) { + tcg_gen_ld_i32(t0, tcg_ctx.tcg_env, aofs + i); + fni(t0, t0); + tcg_gen_st_i32(t0, tcg_ctx.tcg_env, dofs + i); + } + tcg_temp_free_i32(t0); +} + +/* Expand OPSZ bytes worth of three-operand operations using i32 elements. */ +static void expand_3_i32(uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, bool load_dest, + void (*fni)(TCGv_i32, TCGv_i32, TCGv_i32)) +{ + TCGv_i32 t0 = tcg_temp_new_i32(); + TCGv_i32 t1 = tcg_temp_new_i32(); + TCGv_i32 t2 = tcg_temp_new_i32(); + uint32_t i; + + for (i = 0; i < opsz; i += 4) { + tcg_gen_ld_i32(t0, tcg_ctx.tcg_env, aofs + i); + tcg_gen_ld_i32(t1, tcg_ctx.tcg_env, bofs + i); + if (load_dest) { + tcg_gen_ld_i32(t2, tcg_ctx.tcg_env, dofs + i); + } + fni(t2, t0, t1); + tcg_gen_st_i32(t2, tcg_ctx.tcg_env, dofs + i); + } + tcg_temp_free_i32(t2); + tcg_temp_free_i32(t1); + tcg_temp_free_i32(t0); +} + +/* Expand OPSZ bytes worth of two-operand operations using i64 elements. */ +static void expand_2_i64(uint32_t dofs, uint32_t aofs, uint32_t opsz, + void (*fni)(TCGv_i64, TCGv_i64)) +{ + TCGv_i64 t0 = tcg_temp_new_i64(); + uint32_t i; + + for (i = 0; i < opsz; i += 8) { + tcg_gen_ld_i64(t0, tcg_ctx.tcg_env, aofs + i); + fni(t0, t0); + tcg_gen_st_i64(t0, tcg_ctx.tcg_env, dofs + i); + } + tcg_temp_free_i64(t0); +} + +/* Expand OPSZ bytes worth of three-operand operations using i64 elements. */ +static void expand_3_i64(uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, bool load_dest, + void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64)) +{ + TCGv_i64 t0 = tcg_temp_new_i64(); + TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 t2 = tcg_temp_new_i64(); + uint32_t i; + + for (i = 0; i < opsz; i += 8) { + tcg_gen_ld_i64(t0, tcg_ctx.tcg_env, aofs + i); + tcg_gen_ld_i64(t1, tcg_ctx.tcg_env, bofs + i); + if (load_dest) { + tcg_gen_ld_i64(t2, tcg_ctx.tcg_env, dofs + i); + } + fni(t2, t0, t1); + tcg_gen_st_i64(t2, tcg_ctx.tcg_env, dofs + i); + } + tcg_temp_free_i64(t2); + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t0); +} + +/* Expand OPSZ bytes worth of two-operand operations using host vectors. */ +static void expand_2_vec(uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t tysz, TCGType type, + void (*fni)(TCGv_vec, TCGv_vec)) +{ + TCGv_vec t0 = tcg_temp_new_vec(type); + uint32_t i; + + for (i = 0; i < opsz; i += tysz) { + tcg_gen_ld_vec(t0, tcg_ctx.tcg_env, aofs + i); + fni(t0, t0); + tcg_gen_st_vec(t0, tcg_ctx.tcg_env, dofs + i); + } + tcg_temp_free_vec(t0); +} + +/* Expand OPSZ bytes worth of three-operand operations using host vectors. */ +static void expand_3_vec(uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, + uint32_t tysz, TCGType type, bool load_dest, + void (*fni)(TCGv_vec, TCGv_vec, TCGv_vec)) +{ + TCGv_vec t0 = tcg_temp_new_vec(type); + TCGv_vec t1 = tcg_temp_new_vec(type); + TCGv_vec t2 = tcg_temp_new_vec(type); + uint32_t i; + + for (i = 0; i < opsz; i += tysz) { + tcg_gen_ld_vec(t0, tcg_ctx.tcg_env, aofs + i); + tcg_gen_ld_vec(t1, tcg_ctx.tcg_env, bofs + i); + if (load_dest) { + tcg_gen_ld_vec(t2, tcg_ctx.tcg_env, dofs + i); + } + fni(t2, t0, t1); + tcg_gen_st_vec(t2, tcg_ctx.tcg_env, dofs + i); + } + tcg_temp_free_vec(t2); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t0); +} + +/* Expand a vector two-operand operation. */ +void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, const GVecGen2 *g) +{ + check_size_align(oprsz, maxsz, dofs | aofs); + check_overlap_2(dofs, aofs, maxsz); + + /* Quick check for sizes we won't support inline. */ + if (oprsz > MAX_UNROLL * 32 || maxsz > MAX_UNROLL * 32) { + goto do_ool; + } + + /* Recall that ARM SVE allows vector sizes that are not a power of 2. + Expand with successively smaller host vector sizes. The intent is + that e.g. oprsz == 80 would be expanded with 2x32 + 1x16. */ + /* ??? For maxsz > oprsz, the host may be able to use an op-sized + operation, zeroing the balance of the register. We can then + use a cl-sized store to implement the clearing without an extra + store operation. This is true for aarch64 and x86_64 hosts. */ + + if (TCG_TARGET_HAS_v256 && check_size_impl(oprsz, 32)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 32); + expand_2_vec(dofs, aofs, done, 32, TCG_TYPE_V256, g->fniv); + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (TCG_TARGET_HAS_v128 && check_size_impl(oprsz, 16)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 16); + expand_2_vec(dofs, aofs, done, 16, TCG_TYPE_V128, g->fniv); + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (check_size_impl(oprsz, 8)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 8); + if (TCG_TARGET_HAS_v64 && !g->prefer_i64) { + expand_2_vec(dofs, aofs, done, 8, TCG_TYPE_V64, g->fniv); + } else if (g->fni8) { + expand_2_i64(dofs, aofs, done, g->fni8); + } else { + done = 0; + } + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (check_size_impl(oprsz, 4)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 4); + expand_2_i32(dofs, aofs, done, g->fni4); + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (oprsz == 0) { + if (maxsz != 0) { + expand_clr(dofs, maxsz); + } + return; + } + + do_ool: + tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, 0, g->fno); +} + +/* Expand a vector three-operand operation. */ +void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t oprsz, uint32_t maxsz, const GVecGen3 *g) +{ + check_size_align(oprsz, maxsz, dofs | aofs | bofs); + check_overlap_3(dofs, aofs, bofs, maxsz); + + /* Quick check for sizes we won't support inline. */ + if (oprsz > MAX_UNROLL * 32 || maxsz > MAX_UNROLL * 32) { + goto do_ool; + } + + /* Recall that ARM SVE allows vector sizes that are not a power of 2. + Expand with successively smaller host vector sizes. The intent is + that e.g. oprsz == 80 would be expanded with 2x32 + 1x16. */ + /* ??? For maxsz > oprsz, the host may be able to use an op-sized + operation, zeroing the balance of the register. We can then + use a cl-sized store to implement the clearing without an extra + store operation. This is true for aarch64 and x86_64 hosts. */ + + if (TCG_TARGET_HAS_v256 && check_size_impl(oprsz, 32)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 32); + expand_3_vec(dofs, aofs, bofs, done, 32, TCG_TYPE_V256, + g->load_dest, g->fniv); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (TCG_TARGET_HAS_v128 && check_size_impl(oprsz, 16)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 16); + expand_3_vec(dofs, aofs, bofs, done, 16, TCG_TYPE_V128, + g->load_dest, g->fniv); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (check_size_impl(oprsz, 8)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 8); + if (TCG_TARGET_HAS_v64 && !g->prefer_i64) { + expand_3_vec(dofs, aofs, bofs, done, 8, TCG_TYPE_V64, + g->load_dest, g->fniv); + } else if (g->fni8) { + expand_3_i64(dofs, aofs, bofs, done, g->load_dest, g->fni8); + } else { + done = 0; + } + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (check_size_impl(oprsz, 4)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 4); + expand_3_i32(dofs, aofs, bofs, done, g->load_dest, g->fni4); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (oprsz == 0) { + if (maxsz != 0) { + expand_clr(dofs, maxsz); + } + return; + } + + do_ool: + tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, 0, g->fno); +} + +/* + * Expand specific vector operations. + */ + +void tcg_gen_gvec_mov(uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen2 g = { + .fni8 = tcg_gen_mov_i64, + .fniv = tcg_gen_mov_vec, + .fno = gen_helper_gvec_mov, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_2(dofs, aofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_not(uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen2 g = { + .fni8 = tcg_gen_not_i64, + .fniv = tcg_gen_not_vec, + .fno = gen_helper_gvec_not, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_2(dofs, aofs, opsz, clsz, &g); +} + +static void gen_addv_mask(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, TCGv_i64 m) +{ + TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 t2 = tcg_temp_new_i64(); + TCGv_i64 t3 = tcg_temp_new_i64(); + + tcg_gen_andc_i64(t1, a, m); + tcg_gen_andc_i64(t2, b, m); + tcg_gen_xor_i64(t3, a, b); + tcg_gen_add_i64(d, t1, t2); + tcg_gen_and_i64(t3, t3, m); + tcg_gen_xor_i64(d, d, t3); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); + tcg_temp_free_i64(t3); +} + +void tcg_gen_vec_add8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 m = tcg_const_i64(REP8(0x80)); + gen_addv_mask(d, a, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec_add16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 m = tcg_const_i64(REP16(0x8000)); + gen_addv_mask(d, a, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec_add32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 t2 = tcg_temp_new_i64(); + + tcg_gen_andi_i64(t1, a, ~0xffffffffull); + tcg_gen_add_i64(t2, a, b); + tcg_gen_add_i64(t1, t1, b); + tcg_gen_deposit_i64(d, t1, t2, 0, 32); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); +} + +void tcg_gen_gvec_add8(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g = { + .fni8 = tcg_gen_vec_add8_i64, + .fniv = tcg_gen_add8_vec, + .fno = gen_helper_gvec_add8, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_add16(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g = { + .fni8 = tcg_gen_vec_add16_i64, + .fniv = tcg_gen_add16_vec, + .fno = gen_helper_gvec_add16, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_add32(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g = { + .fni4 = tcg_gen_add_i32, + .fniv = tcg_gen_add32_vec, + .fno = gen_helper_gvec_add32, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_add64(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g = { + .fni8 = tcg_gen_add_i64, + .fniv = tcg_gen_add64_vec, + .fno = gen_helper_gvec_add64, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +static void gen_subv_mask(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, TCGv_i64 m) +{ + TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 t2 = tcg_temp_new_i64(); + TCGv_i64 t3 = tcg_temp_new_i64(); + + tcg_gen_or_i64(t1, a, m); + tcg_gen_andc_i64(t2, b, m); + tcg_gen_eqv_i64(t3, a, b); + tcg_gen_sub_i64(d, t1, t2); + tcg_gen_and_i64(t3, t3, m); + tcg_gen_xor_i64(d, d, t3); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); + tcg_temp_free_i64(t3); +} + +void tcg_gen_vec_sub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 m = tcg_const_i64(REP8(0x80)); + gen_subv_mask(d, a, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec_sub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 m = tcg_const_i64(REP16(0x8000)); + gen_subv_mask(d, a, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 t2 = tcg_temp_new_i64(); + + tcg_gen_andi_i64(t1, b, ~0xffffffffull); + tcg_gen_sub_i64(t2, a, b); + tcg_gen_sub_i64(t1, a, t1); + tcg_gen_deposit_i64(d, t1, t2, 0, 32); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); +} + +void tcg_gen_gvec_sub8(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g = { + .fni8 = tcg_gen_vec_sub8_i64, + .fniv = tcg_gen_sub8_vec, + .fno = gen_helper_gvec_sub8, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_sub16(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g = { + .fni8 = tcg_gen_vec_sub16_i64, + .fniv = tcg_gen_sub16_vec, + .fno = gen_helper_gvec_sub16, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_sub32(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g = { + .fni4 = tcg_gen_sub_i32, + .fniv = tcg_gen_sub32_vec, + .fno = gen_helper_gvec_sub32, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_sub64(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g = { + .fni8 = tcg_gen_sub_i64, + .fniv = tcg_gen_sub64_vec, + .fno = gen_helper_gvec_sub64, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +static void gen_negv_mask(TCGv_i64 d, TCGv_i64 b, TCGv_i64 m) +{ + TCGv_i64 t2 = tcg_temp_new_i64(); + TCGv_i64 t3 = tcg_temp_new_i64(); + + tcg_gen_andc_i64(t3, m, b); + tcg_gen_andc_i64(t2, b, m); + tcg_gen_sub_i64(d, m, t2); + tcg_gen_xor_i64(d, d, t3); + + tcg_temp_free_i64(t2); + tcg_temp_free_i64(t3); +} + +void tcg_gen_vec_neg8_i64(TCGv_i64 d, TCGv_i64 b) +{ + TCGv_i64 m = tcg_const_i64(REP8(0x80)); + gen_negv_mask(d, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec_neg16_i64(TCGv_i64 d, TCGv_i64 b) +{ + TCGv_i64 m = tcg_const_i64(REP16(0x8000)); + gen_negv_mask(d, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec_neg32_i64(TCGv_i64 d, TCGv_i64 b) +{ + TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 t2 = tcg_temp_new_i64(); + + tcg_gen_andi_i64(t1, b, ~0xffffffffull); + tcg_gen_neg_i64(t2, b); + tcg_gen_neg_i64(t1, t1); + tcg_gen_deposit_i64(d, t1, t2, 0, 32); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); +} + +void tcg_gen_gvec_neg8(uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen2 g = { + .fni8 = tcg_gen_vec_neg8_i64, + .fniv = tcg_gen_neg8_vec, + .fno = gen_helper_gvec_neg8, + }; + tcg_gen_gvec_2(dofs, aofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_neg16(uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen2 g = { + .fni8 = tcg_gen_vec_neg16_i64, + .fniv = tcg_gen_neg16_vec, + .fno = gen_helper_gvec_neg16, + }; + tcg_gen_gvec_2(dofs, aofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_neg32(uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen2 g = { + .fni4 = tcg_gen_neg_i32, + .fniv = tcg_gen_neg32_vec, + .fno = gen_helper_gvec_neg32, + }; + tcg_gen_gvec_2(dofs, aofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_neg64(uint32_t dofs, uint32_t aofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen2 g = { + .fni8 = tcg_gen_neg_i64, + .fniv = tcg_gen_neg64_vec, + .fno = gen_helper_gvec_neg64, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_2(dofs, aofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_and(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g = { + .fni8 = tcg_gen_and_i64, + .fniv = tcg_gen_and_vec, + .fno = gen_helper_gvec_and, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_or(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g = { + .fni8 = tcg_gen_or_i64, + .fniv = tcg_gen_or_vec, + .fno = gen_helper_gvec_or, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_xor(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g = { + .fni8 = tcg_gen_xor_i64, + .fniv = tcg_gen_xor_vec, + .fno = gen_helper_gvec_xor, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_andc(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g = { + .fni8 = tcg_gen_andc_i64, + .fniv = tcg_gen_andc_vec, + .fno = gen_helper_gvec_andc, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} + +void tcg_gen_gvec_orc(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, uint32_t clsz) +{ + static const GVecGen3 g = { + .fni8 = tcg_gen_orc_i64, + .fniv = tcg_gen_orc_vec, + .fno = gen_helper_gvec_orc, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, clsz, &g); +} diff --git a/accel/tcg/Makefile.objs b/accel/tcg/Makefile.objs index 228cd84fa4..d381a02f34 100644 --- a/accel/tcg/Makefile.objs +++ b/accel/tcg/Makefile.objs @@ -1,6 +1,6 @@ obj-$(CONFIG_SOFTMMU) += tcg-all.o obj-$(CONFIG_SOFTMMU) += cputlb.o -obj-y += tcg-runtime.o +obj-y += tcg-runtime.o tcg-runtime-gvec.o obj-y += cpu-exec.o cpu-exec-common.o translate-all.o obj-y += translator.o From patchwork Sat Sep 16 02:34:14 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 112767 Delivered-To: patch@linaro.org Received: by 10.140.106.117 with SMTP id d108csp1289810qgf; Fri, 15 Sep 2017 19:34:53 -0700 (PDT) X-Received: by 10.55.31.211 with SMTP id n80mr10060880qkh.179.1505529293748; Fri, 15 Sep 2017 19:34:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1505529293; cv=none; d=google.com; s=arc-20160816; b=tupxICQ+DT3iVqh3V3RAwIcwer41c9YHmA6roaz2yFKw5A4dz3K6cvY4+/GEgji0MH EQvTg5Jdm8jNmzP1cipv3jn6tLKBjg8xqMKil/6DA78VRTic4PCM3w7CGnIz+B6F0r43 a7p4oIt+TzJvLq1AE/GRTf6baFq2ThcVUyQOXcMHc4fwsxsno1Y6ZwRRo/Q5s74jG5Oz mu/2RTF48a9z4w3I7cvmM0hdLJgpTke9TScEYPLsh0NNVoeIg4O7Bk/s5kR88VmJI9Ma x+fUsYom2bwu1SoIbf4neQr7+WhOyQZnKdNf2gGublcbMyESsutRJQyTS+cIHjCQttVK yhIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=lD8WxkfXHq3aw3yONu9lTkmVTJMN0+pd3zwR3R+HI70=; b=MMuNVdOrAHPUMJ+jhNjualhfZOGQkqEUkmMVjylhlZtn4RsA1lTavBTO0QE/NBTNn4 2oYLtzT3LSqjqIbPOLPGCNAS1Ltn9ezCl9Q92GfLh4xvPyzaLnEm8lh2jO1av1HT3C9Y UbBOon7duvuQBlQ1OKbhav+E9/ivzcRbdsWn8sWjfylsd+dJZ+byQcE5iAU45L6CzI7w CvcCG2G3Lw/fMr8w2W994GoDz0Rq2qY+6OoRREL5IaOB3xCUzhIuFCgnzW8A67ghKnTf hg0tH8EYUByHeeAOF5WKvHC8Z/lv3YWT4NaGACVFEJ08IMnhTeGFn7BAgRxWN+lCNJP5 5fpA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=dfagObeK; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id n4si2174484qkf.318.2017.09.15.19.34.53 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 15 Sep 2017 19:34:53 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=dfagObeK; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:55689 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dt2wV-0001T2-IA for patch@linaro.org; Fri, 15 Sep 2017 22:34:51 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35708) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dt2w6-0001Sp-U0 for qemu-devel@nongnu.org; Fri, 15 Sep 2017 22:34:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dt2w5-0007j1-Bn for qemu-devel@nongnu.org; Fri, 15 Sep 2017 22:34:26 -0400 Received: from mail-pf0-x22b.google.com ([2607:f8b0:400e:c00::22b]:46961) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dt2w5-0007id-5f for qemu-devel@nongnu.org; Fri, 15 Sep 2017 22:34:25 -0400 Received: by mail-pf0-x22b.google.com with SMTP id e199so2322350pfh.3 for ; Fri, 15 Sep 2017 19:34:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=lD8WxkfXHq3aw3yONu9lTkmVTJMN0+pd3zwR3R+HI70=; b=dfagObeKSj9SPn0hzq4Is+Zu3WU7UGIIt/PhzNu+YSyTxj9TTgNNCbWbuc8yHAqsG8 J7KgGgpjGR+zZu2InmgeFB3rH8gwbtS4yD/KN85h6qUsf3dkHys1tL+cpbWpaJkKC/Pz mrrZLA9W7J8zPvd8k3gJ9++PylRwSx39zel9k= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=lD8WxkfXHq3aw3yONu9lTkmVTJMN0+pd3zwR3R+HI70=; b=WAiy7rtOabDH2egcsPeZrqspG/tHu5Sm2XTm6iz6fm8y6D9bkiucw0kJKQt0mn03QI LEUXNvIV0ku0pZZDkrGKueD03u3Icb9ykFNP08UStPDF3g6g+a4vIcS2V6I57JtpQrYE VnWObZy6CPi5dcbNRPaekCJxiLAQ42ILvKmXQDHLUq4KJ7h9pWyQKidIy2Ir2Vr2m/7S 2nu3UwR0Qf/qKf8P2kXnBIk7dDArP7TToKTqgMZKoL6CLGLC477keQKxyEFVbOlc/Lgs rk/9BOZFnJmo700w8alXeLoBHEWmhptVIwAz9ZvMgA1LK5Ttju++OfuaG3tMbqsetoaE 1PZw== X-Gm-Message-State: AHPjjUi2KQayafzfujwtR1yVLMwikoufQlFG69r1zNyc8C0TWwz/EywB JeH97WiOj+ohIJWyjfqL+Q== X-Google-Smtp-Source: ADKCNb5XcP5avYu0Svun3RZ1n5yGIHffwkHfFp8Rm0Ux/Z5YNaVw/bV5xcudlIvhAUfUvC0bWnnb/w== X-Received: by 10.99.157.200 with SMTP id i191mr25944229pgd.247.1505529264098; Fri, 15 Sep 2017 19:34:24 -0700 (PDT) Received: from bigtime.twiddle.net (97-126-103-167.tukw.qwest.net. [97.126.103.167]) by smtp.gmail.com with ESMTPSA id o5sm4443621pfh.67.2017.09.15.19.34.23 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 15 Sep 2017 19:34:23 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 15 Sep 2017 19:34:14 -0700 Message-Id: <20170916023417.14599-4-richard.henderson@linaro.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: <20170916023417.14599-1-richard.henderson@linaro.org> References: <20170916023417.14599-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::22b Subject: [Qemu-devel] [PATCH v3 3/6] target/arm: Align vector registers X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: alex.bennee@linaro.org, f4bug@amsat.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- target/arm/cpu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.13.5 Signed-off-by: Alex Bennée diff --git a/target/arm/cpu.h b/target/arm/cpu.h index 98b9b26fd3..c346bd148f 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -486,7 +486,7 @@ typedef struct CPUARMState { * the two execution states, and means we do not need to explicitly * map these registers when changing states. */ - float64 regs[64]; + float64 regs[64] QEMU_ALIGNED(16); uint32_t xregs[16]; /* We store these fpcsr fields separately for convenience. */ From patchwork Sat Sep 16 02:34:15 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 112769 Delivered-To: patch@linaro.org Received: by 10.140.106.117 with SMTP id d108csp1290671qgf; Fri, 15 Sep 2017 19:36:06 -0700 (PDT) X-Received: by 10.55.75.8 with SMTP id y8mr9715955qka.255.1505529366480; Fri, 15 Sep 2017 19:36:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1505529366; cv=none; d=google.com; s=arc-20160816; b=I/tBNGEg4Cd03Bu0m3jcmsldbEhWta9q9ppdfub8lmZoHdkIBelx3buzJHUhAjqOIO pfa7hl9zUcw6MVrvD8WVh+e0xJY5O2IBbNu58z7bEGVpLz+7HN2EHOTxPZxqu4Rr3FQk /J9DiAdM4LCPdr2ZabGGoKOmdzd6SjvmoKKI/WWqTpO/EEOOqhszqWEv5wwPul0cnu1R Oz0uVeRn9VOWCtFz++4/4kyrDhjrSep0ln0laUBbBWMVCa3B0TtGsWlZYchYBoVaRwQp 8yACKDRDMTLyzjHGKLEcdcm8nhu/g5Ycw1zLrlSw9JZsZip9dldCjhNzerGS4k3qdhlU DsVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=EpRwOfpv5MtycR9YhS1ivWmI20F9UXXxqAmV1Yjnv2s=; b=jUl6JpKTGdPEhl4/UJEOg2OE3LrwSk/XR6XZwYk7G3332/0J8SiI/aMf80mLVAwfmx GE8LVlX+QQhLgvOmxK1/gZwjIXDUISLvwJ4Fiee5Wn14Wualqab837aMMM/uYx6fChH+ O0TT4NVJYrfF4Iz4iWcYng2/+qfpV74DNxtvZr/aS3dfkJlQqew3BssIpoIBZOE9Bhl3 7BQfT7JaK/MV8MFr7q5t2MfRrr3yvGZ1B8ASgfZsxU6k95GAVleZU+OqxDgWs1lP3KPq 0NKvubAdtDPhcbap75WRTta1Dfv7ikSYjNy0K3alpWZu9Qx96WnHtTPXkBBLRA38ZPdA Uqow== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=Fyr35Ogh; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id r200si2250528qke.224.2017.09.15.19.36.05 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 15 Sep 2017 19:36:06 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=Fyr35Ogh; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:55701 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dt2xg-0002SD-5l for patch@linaro.org; Fri, 15 Sep 2017 22:36:04 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35725) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dt2w8-0001T0-CF for qemu-devel@nongnu.org; Fri, 15 Sep 2017 22:34:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dt2w6-0007jL-PK for qemu-devel@nongnu.org; Fri, 15 Sep 2017 22:34:28 -0400 Received: from mail-pg0-x22c.google.com ([2607:f8b0:400e:c05::22c]:53034) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dt2w6-0007j9-Gw for qemu-devel@nongnu.org; Fri, 15 Sep 2017 22:34:26 -0400 Received: by mail-pg0-x22c.google.com with SMTP id i195so2453453pgd.9 for ; Fri, 15 Sep 2017 19:34:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=EpRwOfpv5MtycR9YhS1ivWmI20F9UXXxqAmV1Yjnv2s=; b=Fyr35Oghk9jCWV8eKKpJXTUl1ObvvnSSJ4i/DDeLHQfx9q9q0sRFz5/ejiAzCJUbpO ouBKB4kfSuDjvNm6ZqptPQ0AGnCADuYKgUsCI7zQGf7zqL5JuXKFXDqAAxvOh8a4RkcH ANR1+9UIqZqw0wR2eTUrSVkWyr3ynJpV3qEDA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=EpRwOfpv5MtycR9YhS1ivWmI20F9UXXxqAmV1Yjnv2s=; b=MAlAO6ZAtHFaXNwKiC/7p1eh0ArueLQaN8/0Cp7kot2I4QIkXgv3pa20/8+eb/C2J1 mqG9xjKHB3vVO75ZgUamElpdxyQ8VPL5QqaLCz8erEqywbW8mylONI+ZhFGI9gEhRB+K qnfDW2a8g4GDL4CRB7yqMjXZ5MYgKGhyE/bQ+1azwdnRxdH3gmYeiBYqsTkFrdPTmyJx tSZv9tJXHty5KEEbG81OVwWDXCJU4p6F5unCLCqCcmdJc+6WEVrPrMWJblILAa9wOvkJ g73uBb+GcG4qml4Wi77938aSIZk9O/vW6KR8dJVaAh0V59kXykaHP46hjwcuIfdEQ1IA FZjQ== X-Gm-Message-State: AHPjjUgHSBbHhDdAvBR45g3TUU0MT+QeUD9hTsznv4dG89xPjXOIDLOY zQovHu1caWRvup5zqcGQkg== X-Google-Smtp-Source: ADKCNb57FL8Ci/D+rSaqCcvRQcS4LrUy+zu90vOPumYEhoTFqgD1srfMDQ/m0AA2G39VbbDmuPIzIw== X-Received: by 10.99.167.68 with SMTP id w4mr15371320pgo.390.1505529265310; Fri, 15 Sep 2017 19:34:25 -0700 (PDT) Received: from bigtime.twiddle.net (97-126-103-167.tukw.qwest.net. [97.126.103.167]) by smtp.gmail.com with ESMTPSA id o5sm4443621pfh.67.2017.09.15.19.34.24 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 15 Sep 2017 19:34:24 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 15 Sep 2017 19:34:15 -0700 Message-Id: <20170916023417.14599-5-richard.henderson@linaro.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: <20170916023417.14599-1-richard.henderson@linaro.org> References: <20170916023417.14599-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::22c Subject: [Qemu-devel] [PATCH v3 4/6] target/arm: Use vector infrastructure for aa64 add/sub/logic X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: alex.bennee@linaro.org, f4bug@amsat.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- target/arm/translate-a64.c | 216 ++++++++++++++++++++++++++++++--------------- 1 file changed, 143 insertions(+), 73 deletions(-) -- 2.13.5 Reviewed-by: Alex Bennée diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index a3984c9a0d..4759cc9829 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -21,6 +21,7 @@ #include "cpu.h" #include "exec/exec-all.h" #include "tcg-op.h" +#include "tcg-op-gvec.h" #include "qemu/log.h" #include "arm_ldst.h" #include "translate.h" @@ -82,6 +83,7 @@ typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr); typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64); typedef void CryptoTwoOpEnvFn(TCGv_ptr, TCGv_i32, TCGv_i32); typedef void CryptoThreeOpEnvFn(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32); +typedef void GVecGenTwoFn(uint32_t, uint32_t, uint32_t, uint32_t, uint32_t); /* initialize TCG globals. */ void a64_translate_init(void) @@ -537,6 +539,21 @@ static inline int vec_reg_offset(DisasContext *s, int regno, return offs; } +/* Return the offset info CPUARMState of the "whole" vector register Qn. */ +static inline int vec_full_reg_offset(DisasContext *s, int regno) +{ + assert_fp_access_checked(s); + return offsetof(CPUARMState, vfp.regs[regno * 2]); +} + +/* Return the byte size of the "whole" vector register, VL / 8. */ +static inline int vec_full_reg_size(DisasContext *s) +{ + /* FIXME SVE: We should put the composite ZCR_EL* value into tb->flags. + In the meantime this is just the AdvSIMD length of 128. */ + return 128 / 8; +} + /* Return the offset into CPUARMState of a slice (from * the least significant end) of FP register Qn (ie * Dn, Sn, Hn or Bn). @@ -9036,85 +9053,125 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn) } } +static void gen_bsl_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm) +{ + tcg_gen_xor_i64(rn, rn, rm); + tcg_gen_and_i64(rn, rn, rd); + tcg_gen_xor_i64(rd, rm, rn); +} + +static void gen_bit_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm) +{ + tcg_gen_xor_i64(rn, rn, rd); + tcg_gen_and_i64(rn, rn, rm); + tcg_gen_xor_i64(rd, rd, rn); +} + +static void gen_bif_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm) +{ + tcg_gen_xor_i64(rn, rn, rd); + tcg_gen_andc_i64(rn, rn, rm); + tcg_gen_xor_i64(rd, rd, rn); +} + +static void gen_bsl_vec(TCGv_vec rd, TCGv_vec rn, TCGv_vec rm) +{ + tcg_gen_xor_vec(rn, rn, rm); + tcg_gen_and_vec(rn, rn, rd); + tcg_gen_xor_vec(rd, rm, rn); +} + +static void gen_bit_vec(TCGv_vec rd, TCGv_vec rn, TCGv_vec rm) +{ + tcg_gen_xor_vec(rn, rn, rd); + tcg_gen_and_vec(rn, rn, rm); + tcg_gen_xor_vec(rd, rd, rn); +} + +static void gen_bif_vec(TCGv_vec rd, TCGv_vec rn, TCGv_vec rm) +{ + tcg_gen_xor_vec(rn, rn, rd); + tcg_gen_andc_vec(rn, rn, rm); + tcg_gen_xor_vec(rd, rd, rn); +} + /* Logic op (opcode == 3) subgroup of C3.6.16. */ static void disas_simd_3same_logic(DisasContext *s, uint32_t insn) { + static const GVecGen3 bsl_op = { + .fni8 = gen_bsl_i64, + .fniv = gen_bsl_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true + }; + static const GVecGen3 bit_op = { + .fni8 = gen_bit_i64, + .fniv = gen_bit_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true + }; + static const GVecGen3 bif_op = { + .fni8 = gen_bif_i64, + .fniv = gen_bif_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true + }; + int rd = extract32(insn, 0, 5); int rn = extract32(insn, 5, 5); int rm = extract32(insn, 16, 5); int size = extract32(insn, 22, 2); bool is_u = extract32(insn, 29, 1); bool is_q = extract32(insn, 30, 1); - TCGv_i64 tcg_op1, tcg_op2, tcg_res[2]; - int pass; + GVecGenTwoFn *gvec_fn; + const GVecGen3 *gvec_op; if (!fp_access_check(s)) { return; } - tcg_op1 = tcg_temp_new_i64(); - tcg_op2 = tcg_temp_new_i64(); - tcg_res[0] = tcg_temp_new_i64(); - tcg_res[1] = tcg_temp_new_i64(); - - for (pass = 0; pass < (is_q ? 2 : 1); pass++) { - read_vec_element(s, tcg_op1, rn, pass, MO_64); - read_vec_element(s, tcg_op2, rm, pass, MO_64); - - if (!is_u) { - switch (size) { - case 0: /* AND */ - tcg_gen_and_i64(tcg_res[pass], tcg_op1, tcg_op2); - break; - case 1: /* BIC */ - tcg_gen_andc_i64(tcg_res[pass], tcg_op1, tcg_op2); - break; - case 2: /* ORR */ - tcg_gen_or_i64(tcg_res[pass], tcg_op1, tcg_op2); - break; - case 3: /* ORN */ - tcg_gen_orc_i64(tcg_res[pass], tcg_op1, tcg_op2); - break; - } - } else { - if (size != 0) { - /* B* ops need res loaded to operate on */ - read_vec_element(s, tcg_res[pass], rd, pass, MO_64); - } - - switch (size) { - case 0: /* EOR */ - tcg_gen_xor_i64(tcg_res[pass], tcg_op1, tcg_op2); - break; - case 1: /* BSL bitwise select */ - tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_op2); - tcg_gen_and_i64(tcg_op1, tcg_op1, tcg_res[pass]); - tcg_gen_xor_i64(tcg_res[pass], tcg_op2, tcg_op1); - break; - case 2: /* BIT, bitwise insert if true */ - tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_res[pass]); - tcg_gen_and_i64(tcg_op1, tcg_op1, tcg_op2); - tcg_gen_xor_i64(tcg_res[pass], tcg_res[pass], tcg_op1); - break; - case 3: /* BIF, bitwise insert if false */ - tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_res[pass]); - tcg_gen_andc_i64(tcg_op1, tcg_op1, tcg_op2); - tcg_gen_xor_i64(tcg_res[pass], tcg_res[pass], tcg_op1); - break; - } - } - } + switch (size + 4 * is_u) { + case 0: /* AND */ + gvec_fn = tcg_gen_gvec_and; + goto do_fn; + case 1: /* BIC */ + gvec_fn = tcg_gen_gvec_andc; + goto do_fn; + case 2: /* ORR */ + gvec_fn = tcg_gen_gvec_or; + goto do_fn; + case 3: /* ORN */ + gvec_fn = tcg_gen_gvec_orc; + goto do_fn; + case 4: /* EOR */ + gvec_fn = tcg_gen_gvec_xor; + goto do_fn; + do_fn: + gvec_fn(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s)); + return; + + case 5: /* BSL bitwise select */ + gvec_op = &bsl_op; + goto do_op; + case 6: /* BIT, bitwise insert if true */ + gvec_op = &bit_op; + goto do_op; + case 7: /* BIF, bitwise insert if false */ + gvec_op = &bif_op; + goto do_op; + do_op: + tcg_gen_gvec_3(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s), gvec_op); + return; - write_vec_element(s, tcg_res[0], rd, 0, MO_64); - if (!is_q) { - tcg_gen_movi_i64(tcg_res[1], 0); + default: + g_assert_not_reached(); } - write_vec_element(s, tcg_res[1], rd, 1, MO_64); - - tcg_temp_free_i64(tcg_op1); - tcg_temp_free_i64(tcg_op2); - tcg_temp_free_i64(tcg_res[0]); - tcg_temp_free_i64(tcg_res[1]); } /* Helper functions for 32 bit comparisons */ @@ -9375,6 +9432,7 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) int rn = extract32(insn, 5, 5); int rd = extract32(insn, 0, 5); int pass; + GVecGenTwoFn *gvec_op; switch (opcode) { case 0x13: /* MUL, PMUL */ @@ -9414,6 +9472,28 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) return; } + switch (opcode) { + case 0x10: /* ADD, SUB */ + { + static GVecGenTwoFn * const fns[4][2] = { + { tcg_gen_gvec_add8, tcg_gen_gvec_sub8 }, + { tcg_gen_gvec_add16, tcg_gen_gvec_sub16 }, + { tcg_gen_gvec_add32, tcg_gen_gvec_sub32 }, + { tcg_gen_gvec_add64, tcg_gen_gvec_sub64 }, + }; + gvec_op = fns[size][u]; + goto do_gvec; + } + break; + + do_gvec: + gvec_op(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s)); + return; + } + if (size == 3) { assert(is_q); for (pass = 0; pass < 2; pass++) { @@ -9586,16 +9666,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) genfn = fns[size][u]; break; } - case 0x10: /* ADD, SUB */ - { - static NeonGenTwoOpFn * const fns[3][2] = { - { gen_helper_neon_add_u8, gen_helper_neon_sub_u8 }, - { gen_helper_neon_add_u16, gen_helper_neon_sub_u16 }, - { tcg_gen_add_i32, tcg_gen_sub_i32 }, - }; - genfn = fns[size][u]; - break; - } case 0x11: /* CMTST, CMEQ */ { static NeonGenTwoOpFn * const fns[3][2] = { From patchwork Sat Sep 16 02:34:16 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 112773 Delivered-To: patch@linaro.org Received: by 10.140.106.117 with SMTP id d108csp1294113qgf; Fri, 15 Sep 2017 19:41:23 -0700 (PDT) X-Received: by 10.55.65.139 with SMTP id o133mr9398979qka.349.1505529683028; Fri, 15 Sep 2017 19:41:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1505529683; cv=none; d=google.com; s=arc-20160816; b=wcUhQkrSJpt7N7UwMZuLLZhEqzeVeqxrHsG3cwoSP1dhgcQgHLHbrqVAsoyK8ieQjD kYHlaHROjeQLq8cV/9xU8pDNWhSrKBBsyS3ZAxC9aDMK/aCzHzYCgIkX4k9PV8RAemai fX1GMmNVVLlbhtdP+fzMiar3ggD7j7d4NKFkXiIZFFMd3nvUBd/cNuIg+Gz2+hyrrMjw /948xuO/zlxVBTkN9AyrDA8rfuRuj9mb4cMPU/B6ZE/pR2NSglja31n71TTRFmhQjNUM f7UwvZkSafB95zkkAW8KVOrCAohV64M88XXuQhXBKrmuOvjpBH+LSB7r+WiJh77KFJWc hB+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=FPeApC4PMqN5hYDFGMlrW2WGw9xtSgDU3RhnoLsAY6M=; b=hSbjADkCjP81xns6ZiY4E/ipFJOuTnJOdrWAnFFiMqlxVIkvLPNwKHehdJ4kBdAS57 IbKLs7TCSrxUh1m5+wrjWpYzZhKmb6ZbQBulW1IrzIObRLTJ6+PzM6qGegpn1VKR2jEO fpDqECPXrrEwra8/xlP2fykchur9SWRFiVsgiVmE25S/NLDGCtOfH5D2JmLvMfGXx2zd JILArU+K0Y1R9gBBT/Xx5NP7cD8eD4aSTeKAYmeGOMNIUorRuzDL9S7mWEpNADahpV1w gIrSezskq0qXKA2RDiD6qj1cPYMLXRM/2dGz6xeCgTUjoJtFwezCXPuHM4BYo3WA0Sx3 ce/Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=F45ig/JN; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id 3si2364660qtm.280.2017.09.15.19.41.22 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 15 Sep 2017 19:41:23 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=F45ig/JN; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:55720 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dt32m-0006Zm-Pf for patch@linaro.org; Fri, 15 Sep 2017 22:41:20 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35751) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dt2wB-0001Vd-GB for qemu-devel@nongnu.org; Fri, 15 Sep 2017 22:34:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dt2w8-0007jw-Jb for qemu-devel@nongnu.org; Fri, 15 Sep 2017 22:34:31 -0400 Received: from mail-pf0-x22d.google.com ([2607:f8b0:400e:c00::22d]:56173) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dt2w8-0007jl-9m for qemu-devel@nongnu.org; Fri, 15 Sep 2017 22:34:28 -0400 Received: by mail-pf0-x22d.google.com with SMTP id r71so2305607pfe.12 for ; Fri, 15 Sep 2017 19:34:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=FPeApC4PMqN5hYDFGMlrW2WGw9xtSgDU3RhnoLsAY6M=; b=F45ig/JNRKTuWWLBkX0MtdRk+u2H11S7uo01F8dVAM9+95pVQCoj61OBhmQCn22pno cvQJDWwXG27NB7RRQbWsTfARHPKztKboq/Ozou2Yp3y/cTpAuqvMSG016WVwW86BxEyH gTPhdpJSKjw3HxkW2ZpNlXbPyIATEnIBoN2+U= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=FPeApC4PMqN5hYDFGMlrW2WGw9xtSgDU3RhnoLsAY6M=; b=ZkCJcVTQctYLp+oG8sH4v1FcEPxaosA2X+0obGvkTz0RWMSHQvAW4Li8ez4vTzINil hLsVfLUjK645VshiaoaM8azkpAqJZAGIJvlVznmq9MfLCKfgfazWY/Rt4lRRgO9J0Kjh aE9PDSqsdgPeHwF1vh7Gm9TtSIZeIyg9tK2mudZpr3PegBMloV6j+WiKOJ32WZ6Nyh4A GZ+1oAYwCXTRU6HUJ5g0aLx9lAuWIP62g7S68xVkspX6/m3i/UP8hwUz5mvbT1Fq1P/m GwmcF2ppvXUnes7SlYZrLZgkrFqbfgmCmANEKvtxS6SpNp2I0AroQczdfUoB75yAi3J4 qEwQ== X-Gm-Message-State: AHPjjUjsaDzyilNJOWSp6JnR0BGmGP5t6IwiiLmS8wW6S5QJZ8x3AqG5 9VoRXX5sXkfzzjMLkl4SDg== X-Google-Smtp-Source: ADKCNb4rbiQ7i+P24XZLwPVDIrK2vvMdFETk/039pY0mRtaH/XOn8Dr2O2IWvJtptelGEzxeYEoryw== X-Received: by 10.101.70.203 with SMTP id n11mr25724463pgr.197.1505529266711; Fri, 15 Sep 2017 19:34:26 -0700 (PDT) Received: from bigtime.twiddle.net (97-126-103-167.tukw.qwest.net. [97.126.103.167]) by smtp.gmail.com with ESMTPSA id o5sm4443621pfh.67.2017.09.15.19.34.25 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 15 Sep 2017 19:34:25 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 15 Sep 2017 19:34:16 -0700 Message-Id: <20170916023417.14599-6-richard.henderson@linaro.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: <20170916023417.14599-1-richard.henderson@linaro.org> References: <20170916023417.14599-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::22d Subject: [Qemu-devel] [PATCH v3 5/6] tcg/i386: Add vector operations X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: alex.bennee@linaro.org, f4bug@amsat.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- tcg/i386/tcg-target.h | 36 +++- tcg/i386/tcg-target.inc.c | 423 +++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 413 insertions(+), 46 deletions(-) -- 2.13.5 diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h index b89dababf4..df69f8db91 100644 --- a/tcg/i386/tcg-target.h +++ b/tcg/i386/tcg-target.h @@ -30,11 +30,10 @@ #ifdef __x86_64__ # define TCG_TARGET_REG_BITS 64 -# define TCG_TARGET_NB_REGS 16 #else # define TCG_TARGET_REG_BITS 32 -# define TCG_TARGET_NB_REGS 8 #endif +# define TCG_TARGET_NB_REGS 24 typedef enum { TCG_REG_EAX = 0, @@ -56,6 +55,19 @@ typedef enum { TCG_REG_R13, TCG_REG_R14, TCG_REG_R15, + + /* SSE registers; 64-bit has access to 8 more, but we won't + need more than a few and using only the first 8 minimizes + the need for a rex prefix on the sse instructions. */ + TCG_REG_XMM0, + TCG_REG_XMM1, + TCG_REG_XMM2, + TCG_REG_XMM3, + TCG_REG_XMM4, + TCG_REG_XMM5, + TCG_REG_XMM6, + TCG_REG_XMM7, + TCG_REG_RAX = TCG_REG_EAX, TCG_REG_RCX = TCG_REG_ECX, TCG_REG_RDX = TCG_REG_EDX, @@ -78,6 +90,17 @@ typedef enum { extern bool have_bmi1; extern bool have_popcnt; +#ifdef __SSE2__ +#define have_sse2 true +#else +extern bool have_sse2; +#endif +#ifdef __AVX2__ +#define have_avx2 true +#else +extern bool have_avx2; +#endif + /* optional instructions */ #define TCG_TARGET_HAS_div2_i32 1 #define TCG_TARGET_HAS_rot_i32 1 @@ -146,6 +169,15 @@ extern bool have_popcnt; #define TCG_TARGET_HAS_mulsh_i64 0 #endif +#define TCG_TARGET_HAS_v64 have_sse2 +#define TCG_TARGET_HAS_v128 have_sse2 +#define TCG_TARGET_HAS_v256 have_avx2 + +#define TCG_TARGET_HAS_andc_vec 1 +#define TCG_TARGET_HAS_orc_vec 0 +#define TCG_TARGET_HAS_not_vec 0 +#define TCG_TARGET_HAS_neg_vec 0 + #define TCG_TARGET_deposit_i32_valid(ofs, len) \ (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \ ((ofs) == 0 && (len) == 16)) diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c index 69e49c9f58..df3be932d5 100644 --- a/tcg/i386/tcg-target.inc.c +++ b/tcg/i386/tcg-target.inc.c @@ -28,10 +28,11 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = { #if TCG_TARGET_REG_BITS == 64 "%rax", "%rcx", "%rdx", "%rbx", "%rsp", "%rbp", "%rsi", "%rdi", - "%r8", "%r9", "%r10", "%r11", "%r12", "%r13", "%r14", "%r15", #else "%eax", "%ecx", "%edx", "%ebx", "%esp", "%ebp", "%esi", "%edi", #endif + "%r8", "%r9", "%r10", "%r11", "%r12", "%r13", "%r14", "%r15", + "%xmm0", "%xmm1", "%xmm2", "%xmm3", "%xmm4", "%xmm5", "%xmm6", "%xmm7", }; #endif @@ -61,6 +62,14 @@ static const int tcg_target_reg_alloc_order[] = { TCG_REG_EDX, TCG_REG_EAX, #endif + TCG_REG_XMM0, + TCG_REG_XMM1, + TCG_REG_XMM2, + TCG_REG_XMM3, + TCG_REG_XMM4, + TCG_REG_XMM5, + TCG_REG_XMM6, + TCG_REG_XMM7, }; static const int tcg_target_call_iarg_regs[] = { @@ -94,7 +103,7 @@ static const int tcg_target_call_oarg_regs[] = { #define TCG_CT_CONST_I32 0x400 #define TCG_CT_CONST_WSZ 0x800 -/* Registers used with L constraint, which are the first argument +/* Registers used with L constraint, which are the first argument registers on x86_64, and two random call clobbered registers on i386. */ #if TCG_TARGET_REG_BITS == 64 @@ -126,6 +135,16 @@ static bool have_cmov; bool have_bmi1; bool have_popcnt; +#ifndef have_sse2 +bool have_sse2; +#endif +#ifdef have_avx2 +#define have_avx1 have_avx2 +#else +static bool have_avx1; +bool have_avx2; +#endif + #ifdef CONFIG_CPUID_H static bool have_movbe; static bool have_bmi2; @@ -192,14 +211,17 @@ static const char *target_parse_constraint(TCGArgConstraint *ct, tcg_regset_set_reg(ct->u.regs, TCG_REG_EDI); break; case 'q': + /* A register that can be used as a byte operand. */ ct->ct |= TCG_CT_REG; ct->u.regs = TCG_TARGET_REG_BITS == 64 ? 0xffff : 0xf; break; case 'Q': + /* A register with an addressable second byte (e.g. %ah). */ ct->ct |= TCG_CT_REG; ct->u.regs = 0xf; break; case 'r': + /* A general register. */ ct->ct |= TCG_CT_REG; ct->u.regs = TCG_TARGET_REG_BITS == 64 ? 0xffff : 0xff; break; @@ -207,6 +229,11 @@ static const char *target_parse_constraint(TCGArgConstraint *ct, /* With TZCNT/LZCNT, we can have operand-size as an input. */ ct->ct |= TCG_CT_CONST_WSZ; break; + case 'x': + /* A vector register. */ + ct->ct |= TCG_CT_REG; + ct->u.regs = 0xff0000; + break; /* qemu_ld/st address constraint */ case 'L': @@ -277,8 +304,9 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type, # define P_REXB_RM 0 # define P_GS 0 #endif -#define P_SIMDF3 0x10000 /* 0xf3 opcode prefix */ -#define P_SIMDF2 0x20000 /* 0xf2 opcode prefix */ +#define P_SIMDF3 0x20000 /* 0xf3 opcode prefix */ +#define P_SIMDF2 0x40000 /* 0xf2 opcode prefix */ +#define P_VEXL 0x80000 /* Set VEX.L = 1 */ #define OPC_ARITH_EvIz (0x81) #define OPC_ARITH_EvIb (0x83) @@ -310,11 +338,30 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type, #define OPC_MOVL_Iv (0xb8) #define OPC_MOVBE_GyMy (0xf0 | P_EXT38) #define OPC_MOVBE_MyGy (0xf1 | P_EXT38) +#define OPC_MOVDQA_GyMy (0x6f | P_EXT | P_DATA16) +#define OPC_MOVDQA_MyGy (0x7f | P_EXT | P_DATA16) +#define OPC_MOVDQU_GyMy (0x6f | P_EXT | P_SIMDF3) +#define OPC_MOVDQU_MyGy (0x7f | P_EXT | P_SIMDF3) +#define OPC_MOVQ_GyMy (0x7e | P_EXT | P_SIMDF3) +#define OPC_MOVQ_MyGy (0xd6 | P_EXT | P_DATA16) #define OPC_MOVSBL (0xbe | P_EXT) #define OPC_MOVSWL (0xbf | P_EXT) #define OPC_MOVSLQ (0x63 | P_REXW) #define OPC_MOVZBL (0xb6 | P_EXT) #define OPC_MOVZWL (0xb7 | P_EXT) +#define OPC_PADDB (0xfc | P_EXT | P_DATA16) +#define OPC_PADDW (0xfd | P_EXT | P_DATA16) +#define OPC_PADDD (0xfe | P_EXT | P_DATA16) +#define OPC_PADDQ (0xd4 | P_EXT | P_DATA16) +#define OPC_PAND (0xdb | P_EXT | P_DATA16) +#define OPC_PANDN (0xdf | P_EXT | P_DATA16) +#define OPC_PCMPEQB (0x74 | P_EXT | P_DATA16) +#define OPC_POR (0xeb | P_EXT | P_DATA16) +#define OPC_PSUBB (0xf8 | P_EXT | P_DATA16) +#define OPC_PSUBW (0xf9 | P_EXT | P_DATA16) +#define OPC_PSUBD (0xfa | P_EXT | P_DATA16) +#define OPC_PSUBQ (0xfb | P_EXT | P_DATA16) +#define OPC_PXOR (0xef | P_EXT | P_DATA16) #define OPC_POP_r32 (0x58) #define OPC_POPCNT (0xb8 | P_EXT | P_SIMDF3) #define OPC_PUSH_r32 (0x50) @@ -330,6 +377,7 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type, #define OPC_SHRX (0xf7 | P_EXT38 | P_SIMDF2) #define OPC_TESTL (0x85) #define OPC_TZCNT (0xbc | P_EXT | P_SIMDF3) +#define OPC_VZEROUPPER (0x77 | P_EXT) #define OPC_XCHG_ax_r32 (0x90) #define OPC_GRP3_Ev (0xf7) @@ -479,11 +527,20 @@ static void tcg_out_modrm(TCGContext *s, int opc, int r, int rm) tcg_out8(s, 0xc0 | (LOWREGMASK(r) << 3) | LOWREGMASK(rm)); } -static void tcg_out_vex_modrm(TCGContext *s, int opc, int r, int v, int rm) +static void tcg_out_vex_opc(TCGContext *s, int opc, int r, int v, + int rm, int index) { int tmp; - if ((opc & (P_REXW | P_EXT | P_EXT38)) || (rm & 8)) { + /* Use the two byte form if possible, which cannot encode + VEX.W, VEX.B, VEX.X, or an m-mmmm field other than P_EXT. */ + if ((opc & (P_EXT | P_EXT38 | P_REXW)) == P_EXT + && ((rm | index) & 8) == 0) { + /* Two byte VEX prefix. */ + tcg_out8(s, 0xc5); + + tmp = (r & 8 ? 0 : 0x80); /* VEX.R */ + } else { /* Three byte VEX prefix. */ tcg_out8(s, 0xc4); @@ -493,20 +550,17 @@ static void tcg_out_vex_modrm(TCGContext *s, int opc, int r, int v, int rm) } else if (opc & P_EXT) { tmp = 1; } else { - tcg_abort(); + g_assert_not_reached(); } - tmp |= 0x40; /* VEX.X */ - tmp |= (r & 8 ? 0 : 0x80); /* VEX.R */ - tmp |= (rm & 8 ? 0 : 0x20); /* VEX.B */ + tmp |= (r & 8 ? 0 : 0x80); /* VEX.R */ + tmp |= (index & 8 ? 0 : 0x40); /* VEX.X */ + tmp |= (rm & 8 ? 0 : 0x20); /* VEX.B */ tcg_out8(s, tmp); - tmp = (opc & P_REXW ? 0x80 : 0); /* VEX.W */ - } else { - /* Two byte VEX prefix. */ - tcg_out8(s, 0xc5); - - tmp = (r & 8 ? 0 : 0x80); /* VEX.R */ + tmp = (opc & P_REXW ? 0x80 : 0); /* VEX.W */ } + + tmp |= (opc & P_VEXL ? 0x04 : 0); /* VEX.L */ /* VEX.pp */ if (opc & P_DATA16) { tmp |= 1; /* 0x66 */ @@ -518,6 +572,11 @@ static void tcg_out_vex_modrm(TCGContext *s, int opc, int r, int v, int rm) tmp |= (~v & 15) << 3; /* VEX.vvvv */ tcg_out8(s, tmp); tcg_out8(s, opc); +} + +static void tcg_out_vex_modrm(TCGContext *s, int opc, int r, int v, int rm) +{ + tcg_out_vex_opc(s, opc, r, v, rm, 0); tcg_out8(s, 0xc0 | (LOWREGMASK(r) << 3) | LOWREGMASK(rm)); } @@ -526,8 +585,8 @@ static void tcg_out_vex_modrm(TCGContext *s, int opc, int r, int v, int rm) mode for absolute addresses, ~RM is the size of the immediate operand that will follow the instruction. */ -static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, - int index, int shift, intptr_t offset) +static void tcg_out_sib_offset(TCGContext *s, int r, int rm, int index, + int shift, intptr_t offset) { int mod, len; @@ -538,7 +597,6 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, intptr_t pc = (intptr_t)s->code_ptr + 5 + ~rm; intptr_t disp = offset - pc; if (disp == (int32_t)disp) { - tcg_out_opc(s, opc, r, 0, 0); tcg_out8(s, (LOWREGMASK(r) << 3) | 5); tcg_out32(s, disp); return; @@ -548,7 +606,6 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, use of the MODRM+SIB encoding and is therefore larger than rip-relative addressing. */ if (offset == (int32_t)offset) { - tcg_out_opc(s, opc, r, 0, 0); tcg_out8(s, (LOWREGMASK(r) << 3) | 4); tcg_out8(s, (4 << 3) | 5); tcg_out32(s, offset); @@ -556,10 +613,9 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, } /* ??? The memory isn't directly addressable. */ - tcg_abort(); + g_assert_not_reached(); } else { /* Absolute address. */ - tcg_out_opc(s, opc, r, 0, 0); tcg_out8(s, (r << 3) | 5); tcg_out32(s, offset); return; @@ -582,7 +638,6 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, that would be used for %esp is the escape to the two byte form. */ if (index < 0 && LOWREGMASK(rm) != TCG_REG_ESP) { /* Single byte MODRM format. */ - tcg_out_opc(s, opc, r, rm, 0); tcg_out8(s, mod | (LOWREGMASK(r) << 3) | LOWREGMASK(rm)); } else { /* Two byte MODRM+SIB format. */ @@ -596,7 +651,6 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, tcg_debug_assert(index != TCG_REG_ESP); } - tcg_out_opc(s, opc, r, rm, index); tcg_out8(s, mod | (LOWREGMASK(r) << 3) | 4); tcg_out8(s, (shift << 6) | (LOWREGMASK(index) << 3) | LOWREGMASK(rm)); } @@ -608,6 +662,21 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, } } +static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, + int index, int shift, intptr_t offset) +{ + tcg_out_opc(s, opc, r, rm < 0 ? 0 : rm, index < 0 ? 0 : index); + tcg_out_sib_offset(s, r, rm, index, shift, offset); +} + +static void tcg_out_vex_modrm_sib_offset(TCGContext *s, int opc, int r, int v, + int rm, int index, int shift, + intptr_t offset) +{ + tcg_out_vex_opc(s, opc, r, v, rm < 0 ? 0 : rm, index < 0 ? 0 : index); + tcg_out_sib_offset(s, r, rm, index, shift, offset); +} + /* A simplification of the above with no index or shift. */ static inline void tcg_out_modrm_offset(TCGContext *s, int opc, int r, int rm, intptr_t offset) @@ -615,6 +684,31 @@ static inline void tcg_out_modrm_offset(TCGContext *s, int opc, int r, tcg_out_modrm_sib_offset(s, opc, r, rm, -1, 0, offset); } +static inline void tcg_out_vex_modrm_offset(TCGContext *s, int opc, int r, + int v, int rm, intptr_t offset) +{ + tcg_out_vex_modrm_sib_offset(s, opc, r, v, rm, -1, 0, offset); +} + +static void tcg_out_maybe_vex_modrm(TCGContext *s, int opc, int r, int rm) +{ + if (have_avx1) { + tcg_out_vex_modrm(s, opc, r, 0, rm); + } else { + tcg_out_modrm(s, opc, r, rm); + } +} + +static void tcg_out_maybe_vex_modrm_offset(TCGContext *s, int opc, int r, + int rm, intptr_t offset) +{ + if (have_avx1) { + tcg_out_vex_modrm_offset(s, opc, r, 0, rm, offset); + } else { + tcg_out_modrm_offset(s, opc, r, rm, offset); + } +} + /* Generate dest op= src. Uses the same ARITH_* codes as tgen_arithi. */ static inline void tgen_arithr(TCGContext *s, int subop, int dest, int src) { @@ -625,12 +719,34 @@ static inline void tgen_arithr(TCGContext *s, int subop, int dest, int src) tcg_out_modrm(s, OPC_ARITH_GvEv + (subop << 3) + ext, dest, src); } -static inline void tcg_out_mov(TCGContext *s, TCGType type, - TCGReg ret, TCGReg arg) +static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg) { - if (arg != ret) { - int opc = OPC_MOVL_GvEv + (type == TCG_TYPE_I64 ? P_REXW : 0); - tcg_out_modrm(s, opc, ret, arg); + if (arg == ret) { + return; + } + switch (type) { + case TCG_TYPE_I32: + tcg_debug_assert(ret < 16 && arg < 16); + tcg_out_modrm(s, OPC_MOVL_GvEv, ret, arg); + break; + case TCG_TYPE_I64: + tcg_debug_assert(ret < 16 && arg < 16); + tcg_out_modrm(s, OPC_MOVL_GvEv | P_REXW, ret, arg); + break; + case TCG_TYPE_V64: + tcg_debug_assert(ret >= 16 && arg >= 16); + tcg_out_maybe_vex_modrm(s, OPC_MOVQ_GyMy, ret, arg); + break; + case TCG_TYPE_V128: + tcg_debug_assert(ret >= 16 && arg >= 16); + tcg_out_maybe_vex_modrm(s, OPC_MOVDQA_GyMy, ret, arg); + break; + case TCG_TYPE_V256: + tcg_debug_assert(ret >= 16 && arg >= 16); + tcg_out_vex_modrm(s, OPC_MOVDQA_GyMy | P_VEXL, ret, 0, arg); + break; + default: + g_assert_not_reached(); } } @@ -638,6 +754,36 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret, tcg_target_long arg) { tcg_target_long diff; + int opc; + + switch (type) { + case TCG_TYPE_I32: + case TCG_TYPE_I64: + tcg_debug_assert(ret < 16); + break; + + case TCG_TYPE_V64: + case TCG_TYPE_V128: + case TCG_TYPE_V256: + tcg_debug_assert(ret >= 16); + /* ??? Revisit this as the implementation progresses. */ + if (arg == 0) { + opc = OPC_PXOR; + } else if (arg == -1) { + opc = OPC_PCMPEQB; + } else { + g_assert_not_reached(); + } + if (have_avx1) { + tcg_out_vex_modrm(s, opc, ret, ret, ret); + } else { + tcg_out_modrm(s, opc, ret, ret); + } + return; + + default: + g_assert_not_reached(); + } if (arg == 0) { tgen_arithr(s, ARITH_XOR, ret, ret); @@ -702,18 +848,64 @@ static inline void tcg_out_pop(TCGContext *s, int reg) tcg_out_opc(s, OPC_POP_r32 + LOWREGMASK(reg), 0, reg, 0); } -static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, - TCGReg arg1, intptr_t arg2) +static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, + TCGReg arg1, intptr_t arg2) { - int opc = OPC_MOVL_GvEv + (type == TCG_TYPE_I64 ? P_REXW : 0); - tcg_out_modrm_offset(s, opc, ret, arg1, arg2); + switch (type) { + case TCG_TYPE_I64: + tcg_debug_assert(ret < 16); + tcg_out_modrm_offset(s, OPC_MOVL_GvEv | P_REXW, ret, arg1, arg2); + break; + case TCG_TYPE_I32: + tcg_debug_assert(ret < 16); + tcg_out_modrm_offset(s, OPC_MOVL_GvEv, ret, arg1, arg2); + break; + case TCG_TYPE_V64: + tcg_debug_assert(ret >= 16); + tcg_out_maybe_vex_modrm_offset(s, OPC_MOVQ_GyMy, ret, arg1, arg2); + break; + case TCG_TYPE_V128: + tcg_debug_assert(ret >= 16); + tcg_out_maybe_vex_modrm_offset(s, OPC_MOVDQU_GyMy, ret, arg1, arg2); + break; + case TCG_TYPE_V256: + tcg_debug_assert(ret >= 16); + tcg_out_vex_modrm_offset(s, OPC_MOVDQU_GyMy | P_VEXL, + ret, 0, arg1, arg2); + break; + default: + g_assert_not_reached(); + } } -static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, - TCGReg arg1, intptr_t arg2) +static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, + TCGReg arg1, intptr_t arg2) { - int opc = OPC_MOVL_EvGv + (type == TCG_TYPE_I64 ? P_REXW : 0); - tcg_out_modrm_offset(s, opc, arg, arg1, arg2); + switch (type) { + case TCG_TYPE_I64: + tcg_debug_assert(arg < 16); + tcg_out_modrm_offset(s, OPC_MOVL_EvGv | P_REXW, arg, arg1, arg2); + break; + case TCG_TYPE_I32: + tcg_debug_assert(arg < 16); + tcg_out_modrm_offset(s, OPC_MOVL_EvGv, arg, arg1, arg2); + break; + case TCG_TYPE_V64: + tcg_debug_assert(arg >= 16); + tcg_out_maybe_vex_modrm_offset(s, OPC_MOVQ_MyGy, arg, arg1, arg2); + break; + case TCG_TYPE_V128: + tcg_debug_assert(arg >= 16); + tcg_out_maybe_vex_modrm_offset(s, OPC_MOVDQU_MyGy, arg, arg1, arg2); + break; + case TCG_TYPE_V256: + tcg_debug_assert(arg >= 16); + tcg_out_vex_modrm_offset(s, OPC_MOVDQU_MyGy | P_VEXL, + arg, 0, arg1, arg2); + break; + default: + g_assert_not_reached(); + } } static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val, @@ -725,6 +917,8 @@ static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val, return false; } rexw = P_REXW; + } else if (type != TCG_TYPE_I32) { + return false; } tcg_out_modrm_offset(s, OPC_MOVL_EvIz | rexw, 0, base, ofs); tcg_out32(s, val); @@ -2254,19 +2448,110 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, } break; + case INDEX_op_add8_vec: + c = OPC_PADDB; + goto gen_simd; + case INDEX_op_add16_vec: + c = OPC_PADDW; + goto gen_simd; + case INDEX_op_add32_vec: + c = OPC_PADDD; + goto gen_simd; + case INDEX_op_add64_vec: + c = OPC_PADDQ; + goto gen_simd; + case INDEX_op_sub8_vec: + c = OPC_PSUBB; + goto gen_simd; + case INDEX_op_sub16_vec: + c = OPC_PSUBW; + goto gen_simd; + case INDEX_op_sub32_vec: + c = OPC_PSUBD; + goto gen_simd; + case INDEX_op_sub64_vec: + c = OPC_PSUBQ; + goto gen_simd; + case INDEX_op_and_vec: + c = OPC_PAND; + goto gen_simd; + case INDEX_op_or_vec: + c = OPC_POR; + goto gen_simd; + case INDEX_op_xor_vec: + c = OPC_PXOR; + gen_simd: + if (args[3] == 2) { + c |= P_VEXL; + } + if (have_avx1) { + tcg_out_vex_modrm(s, c, a0, a1, a2); + } else { + tcg_out_modrm(s, c, a0, a2); + } + break; + case INDEX_op_andc_vec: + c = OPC_PANDN; + if (args[3] == 2) { + c |= P_VEXL; + } + if (have_avx1) { + tcg_out_vex_modrm(s, c, a0, a2, a1); + } else { + tcg_out_modrm(s, c, a0, a1); + } + break; + + case INDEX_op_ld_vec: + case INDEX_op_ldz_vec: + switch (args[3]) { + case 0: + tcg_out_ld(s, TCG_TYPE_V64, a0, a1, a2); + break; + case 1: + tcg_out_ld(s, TCG_TYPE_V128, a0, a1, a2); + break; + case 2: + tcg_out_ld(s, TCG_TYPE_V256, a0, a1, a2); + break; + default: + g_assert_not_reached(); + } + break; + + case INDEX_op_st_vec: + switch (args[3]) { + case 0: + tcg_out_st(s, TCG_TYPE_V64, a0, a1, a2); + break; + case 1: + tcg_out_st(s, TCG_TYPE_V128, a0, a1, a2); + break; + case 2: + tcg_out_st(s, TCG_TYPE_V256, a0, a1, a2); + break; + default: + g_assert_not_reached(); + } + break; + case INDEX_op_mb: tcg_out_mb(s, a0); break; case INDEX_op_mov_i32: /* Always emitted via tcg_out_mov. */ case INDEX_op_mov_i64: + case INDEX_op_mov_vec: case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi. */ case INDEX_op_movi_i64: + case INDEX_op_movi_vec: case INDEX_op_call: /* Always emitted via tcg_out_call. */ default: tcg_abort(); } #undef OP_32_64 +#undef OP_128_256 +#undef OP_64_128_256 } static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) @@ -2292,6 +2577,10 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) = { .args_ct_str = { "r", "r", "L", "L" } }; static const TCGTargetOpDef L_L_L_L = { .args_ct_str = { "L", "L", "L", "L" } }; + static const TCGTargetOpDef x_0_x = { .args_ct_str = { "x", "0", "x" } }; + static const TCGTargetOpDef x_x_0 = { .args_ct_str = { "x", "x", "0" } }; + static const TCGTargetOpDef x_x_x = { .args_ct_str = { "x", "x", "x" } }; + static const TCGTargetOpDef x_r = { .args_ct_str = { "x", "r" } }; switch (op) { case INDEX_op_goto_ptr: @@ -2493,6 +2782,26 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) return &s2; } + case INDEX_op_ld_vec: + case INDEX_op_ldz_vec: + case INDEX_op_st_vec: + return &x_r; + + case INDEX_op_add8_vec: + case INDEX_op_add16_vec: + case INDEX_op_add32_vec: + case INDEX_op_add64_vec: + case INDEX_op_sub8_vec: + case INDEX_op_sub16_vec: + case INDEX_op_sub32_vec: + case INDEX_op_sub64_vec: + case INDEX_op_and_vec: + case INDEX_op_or_vec: + case INDEX_op_xor_vec: + return have_avx1 ? &x_x_x : &x_0_x; + case INDEX_op_andc_vec: + return have_avx1 ? &x_x_x : &x_x_0; + default: break; } @@ -2577,6 +2886,9 @@ static void tcg_target_qemu_prologue(TCGContext *s) tcg_out_addi(s, TCG_REG_CALL_STACK, stack_addend); + if (have_avx2) { + tcg_out_vex_opc(s, OPC_VZEROUPPER, 0, 0, 0, 0); + } for (i = ARRAY_SIZE(tcg_target_callee_save_regs) - 1; i >= 0; i--) { tcg_out_pop(s, tcg_target_callee_save_regs[i]); } @@ -2598,9 +2910,16 @@ static void tcg_out_nop_fill(tcg_insn_unit *p, int count) static void tcg_target_init(TCGContext *s) { #ifdef CONFIG_CPUID_H - unsigned a, b, c, d; + unsigned a, b, c, d, b7 = 0; int max = __get_cpuid_max(0, 0); + if (max >= 7) { + /* BMI1 is available on AMD Piledriver and Intel Haswell CPUs. */ + __cpuid_count(7, 0, a, b7, c, d); + have_bmi1 = (b7 & bit_BMI) != 0; + have_bmi2 = (b7 & bit_BMI2) != 0; + } + if (max >= 1) { __cpuid(1, a, b, c, d); #ifndef have_cmov @@ -2609,17 +2928,26 @@ static void tcg_target_init(TCGContext *s) available, we'll use a small forward branch. */ have_cmov = (d & bit_CMOV) != 0; #endif +#ifndef have_sse2 + have_sse2 = (d & bit_SSE2) != 0; +#endif /* MOVBE is only available on Intel Atom and Haswell CPUs, so we need to probe for it. */ have_movbe = (c & bit_MOVBE) != 0; have_popcnt = (c & bit_POPCNT) != 0; - } - if (max >= 7) { - /* BMI1 is available on AMD Piledriver and Intel Haswell CPUs. */ - __cpuid_count(7, 0, a, b, c, d); - have_bmi1 = (b & bit_BMI) != 0; - have_bmi2 = (b & bit_BMI2) != 0; +#ifndef have_avx2 + /* There are a number of things we must check before we can be + sure of not hitting invalid opcode. */ + if (c & bit_OSXSAVE) { + unsigned xcrl, xcrh; + asm ("xgetbv" : "=a" (xcrl), "=d" (xcrh) : "c" (0)); + if ((xcrl & 6) == 6) { + have_avx1 = (c & bit_AVX) != 0; + have_avx2 = (b7 & bit_AVX2) != 0; + } + } +#endif } max = __get_cpuid_max(0x8000000, 0); @@ -2636,6 +2964,13 @@ static void tcg_target_init(TCGContext *s) } else { tcg_target_available_regs[TCG_TYPE_I32] = 0xff; } + if (have_sse2) { + tcg_target_available_regs[TCG_TYPE_V64] = 0xff0000; + tcg_target_available_regs[TCG_TYPE_V128] = 0xff0000; + } + if (have_avx2) { + tcg_target_available_regs[TCG_TYPE_V256] = 0xff0000; + } tcg_target_call_clobber_regs = 0; tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_EAX); From patchwork Sat Sep 16 02:34:17 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 112768 Delivered-To: patch@linaro.org Received: by 10.140.106.117 with SMTP id d108csp1289973qgf; Fri, 15 Sep 2017 19:35:07 -0700 (PDT) X-Received: by 10.233.221.199 with SMTP id r190mr10651482qkf.174.1505529307298; Fri, 15 Sep 2017 19:35:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1505529307; cv=none; d=google.com; s=arc-20160816; b=keZZCz8LUi2nEslzyHW8eiA3XTwQQSb/4nPldYYYXhCt40Jl16Cj5A4UWJKfdkNz4z B8O404fqWIdfSgKuTUiz+xJfEU+s6VmSr7YbjJtxKkM4cCBkGQlW+N4P20lYd9FivQhF Y1ofTfARg39udx5Ev26MXh1Aw5xtm6JsV6l9aq3Yqbg1HZCgYxXQPPH0pL+OfpGdsRTX bzQ4C/o2ImpV7anHlP5ycesHJu+NUBXsvo0IumOCoueB/gscOlYx0Q+gQEVqXdgUWKBQ KrpfJKe0fJBO7z0xDZTiGEStwQqhxowctywXb2m3bbNNUINeK7ntHyyO4lBD2nIyywRe d9kA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=Xbwsj8B6xHCMs4iyjdz5RzjwK1RkOnLntzKkwwxSdqs=; b=mVX29hw+Zp5pNsbX5qK9ORtqovC7YLOl+DhixflBuFpSUGrBvjRGaWeeQLeaH3Rn7J 50ctDr2CA+hfYIWj0Es53ZzYB9reRTuUGVyx0pzLOpn5rwBUloxpKQnnpyptJLwyg59M 0jBk3pdLhZ9DqiGQnJKd9NIhhkioDNhNhWzPyZ0jDLYHn/KR0Vu+yT1msnYqm04PgdRx 6jvIpk/edfpcgBPC2TMQvxrAGhP6GoJbnjqw/mVahmzia72ogGMxKeV82Qj3B6ABDz8M 5lhT2LUhlU+e6Wnyapi2YvaWhVrqUKlfatd1DPkapFFiH53+Z0oqfV+BJdAAEuHOOHUq qdMQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=NDRGUvhT; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id v3si2387506qta.350.2017.09.15.19.35.06 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 15 Sep 2017 19:35:07 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=NDRGUvhT; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:55693 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dt2wi-0001cO-SQ for patch@linaro.org; Fri, 15 Sep 2017 22:35:04 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35756) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dt2wC-0001WI-16 for qemu-devel@nongnu.org; Fri, 15 Sep 2017 22:34:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dt2w9-0007kE-Kl for qemu-devel@nongnu.org; Fri, 15 Sep 2017 22:34:32 -0400 Received: from mail-pg0-x232.google.com ([2607:f8b0:400e:c05::232]:49986) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dt2w9-0007k5-CQ for qemu-devel@nongnu.org; Fri, 15 Sep 2017 22:34:29 -0400 Received: by mail-pg0-x232.google.com with SMTP id m30so2455043pgn.6 for ; Fri, 15 Sep 2017 19:34:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Xbwsj8B6xHCMs4iyjdz5RzjwK1RkOnLntzKkwwxSdqs=; b=NDRGUvhTcqPyYgZiZfgj8aN9uN14Nl+Jo20h38Ti/lPmMYpOv3nRCFNQfXImDT6Wen pVzYe8IgZ761tLQW0aafxkSUDVv3vUyYKVdq9EeapcZsXVUcb4KhJ7mAfcwcVA3PI3Bo blyvF/XokYXvcbXHgEL08J2847pEAKBPM7dpo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Xbwsj8B6xHCMs4iyjdz5RzjwK1RkOnLntzKkwwxSdqs=; b=DD+yJaOgKFc0KpmRn41k6PtiQJnLyXzAZaJ3l+cXTB7CSCnuz0w4yVwaQ0oamwgQ3M O2alzHSpnDmerurU/q7cK+7nI5HvljJuIC005M1+jlPomcuxINMmP1nf2P/pwEaFEbbH AHVjmVQjACXkjcnmX+jX0S8k1D+dS6DLM/03bTtdLzPw9qCUdFuwNeDozeVb3ofuMm3e zTy2LnPeya6x0Vw4kZ7/kjIvUF5CpznlG2iknOK8IybUqgeQgX57CC6QPLzwsA6URgFw YXUQSS/7JTtfMMUIA+cuoftBCpQEjNCVijolctKFGInzD0B+dDXQ2sWxl0tk0UDGnvyB io4A== X-Gm-Message-State: AHPjjUj1pjnp7bzhdPDxSfQmaJIykNvZAJX09GklSyxDD4usrFkaVqAf xkk2P7vH+sdZjM4ck4pFFw== X-Google-Smtp-Source: ADKCNb6iZSG9HQjVDT8yYRyjrvWTW8dXvLS59pRdiRbRuCcW7YVCBgAUG/b1EH45ZV/+nTkrA6OEvw== X-Received: by 10.99.114.19 with SMTP id n19mr24788947pgc.300.1505529267918; Fri, 15 Sep 2017 19:34:27 -0700 (PDT) Received: from bigtime.twiddle.net (97-126-103-167.tukw.qwest.net. [97.126.103.167]) by smtp.gmail.com with ESMTPSA id o5sm4443621pfh.67.2017.09.15.19.34.26 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 15 Sep 2017 19:34:27 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 15 Sep 2017 19:34:17 -0700 Message-Id: <20170916023417.14599-7-richard.henderson@linaro.org> X-Mailer: git-send-email 2.13.5 In-Reply-To: <20170916023417.14599-1-richard.henderson@linaro.org> References: <20170916023417.14599-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::232 Subject: [Qemu-devel] [PATCH v3 6/6] tcg/aarch64: Add vector operations X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: alex.bennee@linaro.org, f4bug@amsat.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- tcg/aarch64/tcg-target.h | 20 ++- tcg/aarch64/tcg-target.inc.c | 340 +++++++++++++++++++++++++++++++++++++------ 2 files changed, 315 insertions(+), 45 deletions(-) -- 2.13.5 diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h index c2525066ab..c3e8c4480f 100644 --- a/tcg/aarch64/tcg-target.h +++ b/tcg/aarch64/tcg-target.h @@ -31,13 +31,22 @@ typedef enum { TCG_REG_SP = 31, TCG_REG_XZR = 31, + TCG_REG_V0 = 32, TCG_REG_V1, TCG_REG_V2, TCG_REG_V3, + TCG_REG_V4, TCG_REG_V5, TCG_REG_V6, TCG_REG_V7, + TCG_REG_V8, TCG_REG_V9, TCG_REG_V10, TCG_REG_V11, + TCG_REG_V12, TCG_REG_V13, TCG_REG_V14, TCG_REG_V15, + TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19, + TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23, + TCG_REG_V24, TCG_REG_V25, TCG_REG_V26, TCG_REG_V27, + TCG_REG_V28, TCG_REG_V29, TCG_REG_V30, TCG_REG_V31, + /* Aliases. */ TCG_REG_FP = TCG_REG_X29, TCG_REG_LR = TCG_REG_X30, TCG_AREG0 = TCG_REG_X19, } TCGReg; -#define TCG_TARGET_NB_REGS 32 +#define TCG_TARGET_NB_REGS 64 /* used for function call generation */ #define TCG_REG_CALL_STACK TCG_REG_SP @@ -113,6 +122,15 @@ typedef enum { #define TCG_TARGET_HAS_mulsh_i64 1 #define TCG_TARGET_HAS_direct_jump 1 +#define TCG_TARGET_HAS_v64 1 +#define TCG_TARGET_HAS_v128 1 +#define TCG_TARGET_HAS_v256 0 + +#define TCG_TARGET_HAS_andc_vec 1 +#define TCG_TARGET_HAS_orc_vec 1 +#define TCG_TARGET_HAS_not_vec 1 +#define TCG_TARGET_HAS_neg_vec 1 + #define TCG_TARGET_DEFAULT_MO (0) static inline void flush_icache_range(uintptr_t start, uintptr_t stop) diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c index 150530f30e..4b401cfe6c 100644 --- a/tcg/aarch64/tcg-target.inc.c +++ b/tcg/aarch64/tcg-target.inc.c @@ -20,10 +20,15 @@ QEMU_BUILD_BUG_ON(TCG_TYPE_I32 != 0 || TCG_TYPE_I64 != 1); #ifdef CONFIG_DEBUG_TCG static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = { - "%x0", "%x1", "%x2", "%x3", "%x4", "%x5", "%x6", "%x7", - "%x8", "%x9", "%x10", "%x11", "%x12", "%x13", "%x14", "%x15", - "%x16", "%x17", "%x18", "%x19", "%x20", "%x21", "%x22", "%x23", - "%x24", "%x25", "%x26", "%x27", "%x28", "%fp", "%x30", "%sp", + "x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7", + "x8", "x9", "x10", "x11", "x12", "x13", "x14", "x15", + "x16", "x17", "x18", "x19", "x20", "x21", "x22", "x23", + "x24", "x25", "x26", "x27", "x28", "fp", "x30", "sp", + + "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", + "v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15", + "v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23", + "v24", "v25", "v26", "v27", "v28", "fp", "v30", "v31", }; #endif /* CONFIG_DEBUG_TCG */ @@ -43,6 +48,14 @@ static const int tcg_target_reg_alloc_order[] = { /* X19 reserved for AREG0 */ /* X29 reserved as fp */ /* X30 reserved as temporary */ + + TCG_REG_V0, TCG_REG_V1, TCG_REG_V2, TCG_REG_V3, + TCG_REG_V4, TCG_REG_V5, TCG_REG_V6, TCG_REG_V7, + /* V8 - V15 are call-saved, and skipped. */ + TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19, + TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23, + TCG_REG_V24, TCG_REG_V25, TCG_REG_V26, TCG_REG_V27, + TCG_REG_V28, TCG_REG_V29, TCG_REG_V30, TCG_REG_V31, }; static const int tcg_target_call_iarg_regs[8] = { @@ -119,10 +132,14 @@ static const char *target_parse_constraint(TCGArgConstraint *ct, const char *ct_str, TCGType type) { switch (*ct_str++) { - case 'r': + case 'r': /* general registers */ ct->ct |= TCG_CT_REG; ct->u.regs = 0xffffffffu; break; + case 'w': /* advsimd registers */ + ct->ct |= TCG_CT_REG; + ct->u.regs = 0xffffffff00000000ull; + break; case 'l': /* qemu_ld / qemu_st address, data_reg */ ct->ct |= TCG_CT_REG; ct->u.regs = 0xffffffffu; @@ -290,6 +307,12 @@ typedef enum { I3312_LDRSHX = 0x38000000 | LDST_LD_S_X << 22 | MO_16 << 30, I3312_LDRSWX = 0x38000000 | LDST_LD_S_X << 22 | MO_32 << 30, + I3312_LDRVD = 0x3c000000 | LDST_LD << 22 | MO_64 << 30, + I3312_STRVD = 0x3c000000 | LDST_ST << 22 | MO_64 << 30, + + I3312_LDRVQ = 0x3c000000 | 3 << 22 | 0 << 30, + I3312_STRVQ = 0x3c000000 | 2 << 22 | 0 << 30, + I3312_TO_I3310 = 0x00200800, I3312_TO_I3313 = 0x01000000, @@ -374,8 +397,33 @@ typedef enum { I3510_EON = 0x4a200000, I3510_ANDS = 0x6a000000, - NOP = 0xd503201f, + /* AdvSIMD modified immediate */ + I3606_MOVI = 0x0f000400, + + /* AdvSIMD three same. */ + I3616_ADD_B = 0x0e208400, + I3616_ADD_H = 0x0e608400, + I3616_ADD_S = 0x0ea08400, + I3616_ADD_D = 0x4ee08400, + I3616_AND = 0x0e201c00, + I3616_BIC = 0x0e601c00, + I3616_EOR = 0x2e201c00, + I3616_ORR = 0x0ea01c00, + I3616_ORN = 0x0ee01c00, + I3616_SUB_B = 0x2e208400, + I3616_SUB_H = 0x2e608400, + I3616_SUB_S = 0x2ea08400, + I3616_SUB_D = 0x6ee08400, + + /* AdvSIMD two-reg misc. */ + I3617_NOT = 0x2e205800, + I3617_NEG_B = 0x2e20b800, + I3617_NEG_H = 0x2e60b800, + I3617_NEG_S = 0x2ea0b800, + I3617_NEG_D = 0x6ee0b800, + /* System instructions. */ + NOP = 0xd503201f, DMB_ISH = 0xd50338bf, DMB_LD = 0x00000100, DMB_ST = 0x00000200, @@ -520,26 +568,47 @@ static void tcg_out_insn_3509(TCGContext *s, AArch64Insn insn, TCGType ext, tcg_out32(s, insn | ext << 31 | rm << 16 | ra << 10 | rn << 5 | rd); } +static void tcg_out_insn_3606(TCGContext *s, AArch64Insn insn, bool q, + TCGReg rd, bool op, int cmode, uint8_t imm8) +{ + tcg_out32(s, insn | q << 30 | op << 29 | cmode << 12 | (rd & 0x1f) + | (imm8 & 0xe0) << 16 | (imm8 & 0x1f) << 5); +} + +static void tcg_out_insn_3616(TCGContext *s, AArch64Insn insn, bool q, + TCGReg rd, TCGReg rn, TCGReg rm) +{ + tcg_out32(s, insn | q << 30 | (rm & 0x1f) << 16 + | (rn & 0x1f) << 5 | (rd & 0x1f)); +} + +static void tcg_out_insn_3617(TCGContext *s, AArch64Insn insn, bool q, + TCGReg rd, TCGReg rn) +{ + tcg_out32(s, insn | q << 30 | (rn & 0x1f) << 5 | (rd & 0x1f)); +} + static void tcg_out_insn_3310(TCGContext *s, AArch64Insn insn, TCGReg rd, TCGReg base, TCGType ext, TCGReg regoff) { /* Note the AArch64Insn constants above are for C3.3.12. Adjust. */ tcg_out32(s, insn | I3312_TO_I3310 | regoff << 16 | - 0x4000 | ext << 13 | base << 5 | rd); + 0x4000 | ext << 13 | base << 5 | (rd & 0x1f)); } static void tcg_out_insn_3312(TCGContext *s, AArch64Insn insn, TCGReg rd, TCGReg rn, intptr_t offset) { - tcg_out32(s, insn | (offset & 0x1ff) << 12 | rn << 5 | rd); + tcg_out32(s, insn | (offset & 0x1ff) << 12 | rn << 5 | (rd & 0x1f)); } static void tcg_out_insn_3313(TCGContext *s, AArch64Insn insn, TCGReg rd, TCGReg rn, uintptr_t scaled_uimm) { /* Note the AArch64Insn constants above are for C3.3.12. Adjust. */ - tcg_out32(s, insn | I3312_TO_I3313 | scaled_uimm << 10 | rn << 5 | rd); + tcg_out32(s, insn | I3312_TO_I3313 | scaled_uimm << 10 + | rn << 5 | (rd & 0x1f)); } /* Register to register move using ORR (shifted register with no shift). */ @@ -594,6 +663,24 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd, int s0, s1; AArch64Insn opc; + switch (type) { + case TCG_TYPE_I32: + case TCG_TYPE_I64: + tcg_debug_assert(rd < 32); + break; + + case TCG_TYPE_V64: + case TCG_TYPE_V128: + tcg_debug_assert(rd >= 32); + /* ??? Revisit this as the implementation progresses. */ + tcg_debug_assert(value == 0); + tcg_out_insn(s, 3606, MOVI, 0, rd, 0, 0, 0); + return; + + default: + g_assert_not_reached(); + } + /* For 32-bit values, discard potential garbage in value. For 64-bit values within [2**31, 2**32-1], we can create smaller sequences by interpreting this as a negative 32-bit number, while ensuring that @@ -669,15 +756,13 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd, /* Define something more legible for general use. */ #define tcg_out_ldst_r tcg_out_insn_3310 -static void tcg_out_ldst(TCGContext *s, AArch64Insn insn, - TCGReg rd, TCGReg rn, intptr_t offset) +static void tcg_out_ldst(TCGContext *s, AArch64Insn insn, TCGReg rd, + TCGReg rn, intptr_t offset, int lgsize) { - TCGMemOp size = (uint32_t)insn >> 30; - /* If the offset is naturally aligned and in range, then we can use the scaled uimm12 encoding */ - if (offset >= 0 && !(offset & ((1 << size) - 1))) { - uintptr_t scaled_uimm = offset >> size; + if (offset >= 0 && !(offset & ((1 << lgsize) - 1))) { + uintptr_t scaled_uimm = offset >> lgsize; if (scaled_uimm <= 0xfff) { tcg_out_insn_3313(s, insn, rd, rn, scaled_uimm); return; @@ -695,32 +780,94 @@ static void tcg_out_ldst(TCGContext *s, AArch64Insn insn, tcg_out_ldst_r(s, insn, rd, rn, TCG_TYPE_I64, TCG_REG_TMP); } -static inline void tcg_out_mov(TCGContext *s, - TCGType type, TCGReg ret, TCGReg arg) +static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg) { - if (ret != arg) { + if (ret == arg) { + return; + } + switch (type) { + case TCG_TYPE_I32: + case TCG_TYPE_I64: + tcg_debug_assert(ret < 32 && arg < 32); tcg_out_movr(s, type, ret, arg); + break; + + case TCG_TYPE_V64: + tcg_debug_assert(ret >= 32 && arg >= 32); + tcg_out_insn(s, 3616, ORR, 0, ret, arg, arg); + break; + case TCG_TYPE_V128: + tcg_debug_assert(ret >= 32 && arg >= 32); + tcg_out_insn(s, 3616, ORR, 1, ret, arg, arg); + break; + + default: + g_assert_not_reached(); } } -static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg, - TCGReg arg1, intptr_t arg2) +static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg, + TCGReg arg1, intptr_t arg2) { - tcg_out_ldst(s, type == TCG_TYPE_I32 ? I3312_LDRW : I3312_LDRX, - arg, arg1, arg2); + AArch64Insn insn; + int lgsz; + + switch (type) { + case TCG_TYPE_I32: + insn = I3312_LDRW; + lgsz = 2; + break; + case TCG_TYPE_I64: + insn = I3312_LDRX; + lgsz = 3; + break; + case TCG_TYPE_V64: + insn = I3312_LDRVD; + lgsz = 3; + break; + case TCG_TYPE_V128: + insn = I3312_LDRVQ; + lgsz = 4; + break; + default: + g_assert_not_reached(); + } + tcg_out_ldst(s, insn, arg, arg1, arg2, lgsz); } -static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, - TCGReg arg1, intptr_t arg2) +static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, + TCGReg arg1, intptr_t arg2) { - tcg_out_ldst(s, type == TCG_TYPE_I32 ? I3312_STRW : I3312_STRX, - arg, arg1, arg2); + AArch64Insn insn; + int lgsz; + + switch (type) { + case TCG_TYPE_I32: + insn = I3312_STRW; + lgsz = 2; + break; + case TCG_TYPE_I64: + insn = I3312_STRX; + lgsz = 3; + break; + case TCG_TYPE_V64: + insn = I3312_STRVD; + lgsz = 3; + break; + case TCG_TYPE_V128: + insn = I3312_STRVQ; + lgsz = 4; + break; + default: + g_assert_not_reached(); + } + tcg_out_ldst(s, insn, arg, arg1, arg2, lgsz); } static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val, TCGReg base, intptr_t ofs) { - if (val == 0) { + if (type <= TCG_TYPE_I64 && val == 0) { tcg_out_st(s, type, TCG_REG_XZR, base, ofs); return true; } @@ -1210,14 +1357,15 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp opc, /* Merge "low bits" from tlb offset, load the tlb comparator into X0. X0 = load [X2 + (tlb_offset & 0x000fff)] */ tcg_out_ldst(s, TARGET_LONG_BITS == 32 ? I3312_LDRW : I3312_LDRX, - TCG_REG_X0, TCG_REG_X2, tlb_offset & 0xfff); + TCG_REG_X0, TCG_REG_X2, tlb_offset & 0xfff, + TARGET_LONG_BITS == 32 ? 2 : 3); /* Load the tlb addend. Do that early to avoid stalling. X1 = load [X2 + (tlb_offset & 0xfff) + offsetof(addend)] */ tcg_out_ldst(s, I3312_LDRX, TCG_REG_X1, TCG_REG_X2, (tlb_offset & 0xfff) + (offsetof(CPUTLBEntry, addend)) - (is_read ? offsetof(CPUTLBEntry, addr_read) - : offsetof(CPUTLBEntry, addr_write))); + : offsetof(CPUTLBEntry, addr_write)), 3); /* Perform the address comparison. */ tcg_out_cmp(s, (TARGET_LONG_BITS == 64), TCG_REG_X0, TCG_REG_X3, 0); @@ -1435,49 +1583,49 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, case INDEX_op_ld8u_i32: case INDEX_op_ld8u_i64: - tcg_out_ldst(s, I3312_LDRB, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRB, a0, a1, a2, 0); break; case INDEX_op_ld8s_i32: - tcg_out_ldst(s, I3312_LDRSBW, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRSBW, a0, a1, a2, 0); break; case INDEX_op_ld8s_i64: - tcg_out_ldst(s, I3312_LDRSBX, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRSBX, a0, a1, a2, 0); break; case INDEX_op_ld16u_i32: case INDEX_op_ld16u_i64: - tcg_out_ldst(s, I3312_LDRH, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRH, a0, a1, a2, 1); break; case INDEX_op_ld16s_i32: - tcg_out_ldst(s, I3312_LDRSHW, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRSHW, a0, a1, a2, 1); break; case INDEX_op_ld16s_i64: - tcg_out_ldst(s, I3312_LDRSHX, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRSHX, a0, a1, a2, 1); break; case INDEX_op_ld_i32: case INDEX_op_ld32u_i64: - tcg_out_ldst(s, I3312_LDRW, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRW, a0, a1, a2, 2); break; case INDEX_op_ld32s_i64: - tcg_out_ldst(s, I3312_LDRSWX, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRSWX, a0, a1, a2, 2); break; case INDEX_op_ld_i64: - tcg_out_ldst(s, I3312_LDRX, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRX, a0, a1, a2, 3); break; case INDEX_op_st8_i32: case INDEX_op_st8_i64: - tcg_out_ldst(s, I3312_STRB, REG0(0), a1, a2); + tcg_out_ldst(s, I3312_STRB, REG0(0), a1, a2, 0); break; case INDEX_op_st16_i32: case INDEX_op_st16_i64: - tcg_out_ldst(s, I3312_STRH, REG0(0), a1, a2); + tcg_out_ldst(s, I3312_STRH, REG0(0), a1, a2, 1); break; case INDEX_op_st_i32: case INDEX_op_st32_i64: - tcg_out_ldst(s, I3312_STRW, REG0(0), a1, a2); + tcg_out_ldst(s, I3312_STRW, REG0(0), a1, a2, 2); break; case INDEX_op_st_i64: - tcg_out_ldst(s, I3312_STRX, REG0(0), a1, a2); + tcg_out_ldst(s, I3312_STRX, REG0(0), a1, a2, 3); break; case INDEX_op_add_i32: @@ -1774,13 +1922,77 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, tcg_out_mb(s, a0); break; + case INDEX_op_ld_vec: + case INDEX_op_ldz_vec: + tcg_out_ld(s, TCG_TYPE_V64 + args[3], a0, a1, a2); + break; + case INDEX_op_st_vec: + tcg_out_st(s, TCG_TYPE_V64 + args[3], a0, a1, a2); + break; + case INDEX_op_add8_vec: + tcg_out_insn(s, 3616, ADD_B, args[3], a0, a1, a2); + break; + case INDEX_op_add16_vec: + tcg_out_insn(s, 3616, ADD_H, args[3], a0, a1, a2); + break; + case INDEX_op_add32_vec: + tcg_out_insn(s, 3616, ADD_S, args[3], a0, a1, a2); + break; + case INDEX_op_add64_vec: + tcg_out_insn(s, 3616, ADD_D, 1, a0, a1, a2); + break; + case INDEX_op_sub8_vec: + tcg_out_insn(s, 3616, SUB_B, args[3], a0, a1, a2); + break; + case INDEX_op_sub16_vec: + tcg_out_insn(s, 3616, SUB_H, args[3], a0, a1, a2); + break; + case INDEX_op_sub32_vec: + tcg_out_insn(s, 3616, SUB_S, args[3], a0, a1, a2); + break; + case INDEX_op_sub64_vec: + tcg_out_insn(s, 3616, SUB_D, 1, a0, a1, a2); + break; + case INDEX_op_neg8_vec: + tcg_out_insn(s, 3617, NEG_B, a2, a0, a1); + break; + case INDEX_op_neg16_vec: + tcg_out_insn(s, 3617, NEG_H, a2, a0, a1); + break; + case INDEX_op_neg32_vec: + tcg_out_insn(s, 3617, NEG_S, a2, a0, a1); + break; + case INDEX_op_neg64_vec: + tcg_out_insn(s, 3617, NEG_D, 1, a0, a1); + break; + case INDEX_op_and_vec: + tcg_out_insn(s, 3616, AND, args[3], a0, a1, a2); + break; + case INDEX_op_or_vec: + tcg_out_insn(s, 3616, ORR, args[3], a0, a1, a2); + break; + case INDEX_op_xor_vec: + tcg_out_insn(s, 3616, EOR, args[3], a0, a1, a2); + break; + case INDEX_op_andc_vec: + tcg_out_insn(s, 3616, BIC, args[3], a0, a1, a2); + break; + case INDEX_op_orc_vec: + tcg_out_insn(s, 3616, ORN, args[3], a0, a1, a2); + break; + case INDEX_op_not_vec: + tcg_out_insn(s, 3617, NOT, a2, a0, a1); + break; + case INDEX_op_mov_i32: /* Always emitted via tcg_out_mov. */ case INDEX_op_mov_i64: + case INDEX_op_mov_vec: case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi. */ case INDEX_op_movi_i64: + case INDEX_op_movi_vec: case INDEX_op_call: /* Always emitted via tcg_out_call. */ default: - tcg_abort(); + g_assert_not_reached(); } #undef REG0 @@ -1790,11 +2002,14 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) { static const TCGTargetOpDef r = { .args_ct_str = { "r" } }; static const TCGTargetOpDef r_r = { .args_ct_str = { "r", "r" } }; + static const TCGTargetOpDef w_w = { .args_ct_str = { "w", "w" } }; + static const TCGTargetOpDef w_r = { .args_ct_str = { "w", "r" } }; static const TCGTargetOpDef r_l = { .args_ct_str = { "r", "l" } }; static const TCGTargetOpDef r_rA = { .args_ct_str = { "r", "rA" } }; static const TCGTargetOpDef rZ_r = { .args_ct_str = { "rZ", "r" } }; static const TCGTargetOpDef lZ_l = { .args_ct_str = { "lZ", "l" } }; static const TCGTargetOpDef r_r_r = { .args_ct_str = { "r", "r", "r" } }; + static const TCGTargetOpDef w_w_w = { .args_ct_str = { "w", "w", "w" } }; static const TCGTargetOpDef r_r_ri = { .args_ct_str = { "r", "r", "ri" } }; static const TCGTargetOpDef r_r_rA = { .args_ct_str = { "r", "r", "rA" } }; static const TCGTargetOpDef r_r_rL = { .args_ct_str = { "r", "r", "rL" } }; @@ -1938,6 +2153,33 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) case INDEX_op_sub2_i64: return &add2; + case INDEX_op_add8_vec: + case INDEX_op_add16_vec: + case INDEX_op_add32_vec: + case INDEX_op_add64_vec: + case INDEX_op_sub8_vec: + case INDEX_op_sub16_vec: + case INDEX_op_sub32_vec: + case INDEX_op_sub64_vec: + case INDEX_op_and_vec: + case INDEX_op_or_vec: + case INDEX_op_xor_vec: + case INDEX_op_andc_vec: + case INDEX_op_orc_vec: + return &w_w_w; + + case INDEX_op_not_vec: + case INDEX_op_neg8_vec: + case INDEX_op_neg16_vec: + case INDEX_op_neg32_vec: + case INDEX_op_neg64_vec: + return &w_w; + + case INDEX_op_ld_vec: + case INDEX_op_ldz_vec: + case INDEX_op_st_vec: + return &w_r; + default: return NULL; } @@ -1947,8 +2189,10 @@ static void tcg_target_init(TCGContext *s) { tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffffu; tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffffu; + tcg_target_available_regs[TCG_TYPE_V64] = 0xffffffff00000000ull; + tcg_target_available_regs[TCG_TYPE_V128] = 0xffffffff00000000ull; - tcg_target_call_clobber_regs = 0xfffffffu; + tcg_target_call_clobber_regs = -1ull; tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_X19); tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_X20); tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_X21); @@ -1960,6 +2204,14 @@ static void tcg_target_init(TCGContext *s) tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_X27); tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_X28); tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_X29); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V8); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V9); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V10); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V11); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V12); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V13); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V14); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V15); s->reserved_regs = 0; tcg_regset_set_reg(s->reserved_regs, TCG_REG_SP);