From patchwork Sat Jan 6 03:13:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123600 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp121427qgn; Fri, 5 Jan 2018 19:14:19 -0800 (PST) X-Google-Smtp-Source: ACJfBot9tLa9crDL+dxScgrYmV0o85P08sCPOoBZd2cIHOW8jNLWg/l8EbWkLEJ+gjguQVV9RRrv X-Received: by 10.129.131.22 with SMTP id t22mr4589685ywf.403.1515208459694; Fri, 05 Jan 2018 19:14:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515208459; cv=none; d=google.com; s=arc-20160816; b=D0kZBOxPiTFuC1O2jZ+2ThqsxSA2KD4+cun3TQ6a7x3pUT8xXEURbnPKcY/zszf2z3 ccmMCH49ImStCB4KpPoOB3u6YRYsHnbJGWLVD9mHWtaONsEv3Fc4RhexXv7ltwa3I8tV wKUzN46ShDgnV4GZwpytGkDUdWMmMr2LTogYhTLqJr2MHcsGcZV47q064puiFPLOXKs5 +SgvZOu0yJ0XXa0g/w3qvpdA2xdAeOhvint1A3AQbWzSU0VRDJbfTZVNO7Js3E8x6849 67VSJLbhEjlLzMkGwUP6t5O7WNdYlg3BMLj2f7jatOpkAAsm/QOD/0chmgEZMy0Nw3JW bkJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=hWNeV+zOx+oAfKu9VCxGrhv2jH/fOaNDlWpbf3/FB4Y=; b=tOCisAbKXQi7LYcrZ+bzJZglLf/mHaAykJrrUxQrOb/q/5WoY6qpTFkFdMfOGnlkff puvRporY8Ldaik6qOVw+U83GSZFnIIZoZb+4o61FC4aODGjk2cE+f4uflxLMwKvKFswo XjO0/pP7aXEfOeyAJyN9rD+YMd/sfqrJbXpD1Pc6PPd9IdSigy/f6aStd7WVCD9h8qbi Z8O5s8u9oEO4Csuc5cKRae0UuJFX74zOQjBDDu3ZWtYouDYpTQwYLlgwygZya7PQirJu Ck33UN7NMuFD4IhAHpcYWIzk9mjHsQcvthjMu8vHVXvQqn3qSuSnDR4/z9qu9Dq+QZ4W a3Iw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=Ibqsvx1N; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id l12si49067ywh.692.2018.01.05.19.14.19 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:14:19 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=Ibqsvx1N; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44020 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXew7-0007Nz-3c for patch@linaro.org; Fri, 05 Jan 2018 22:14:19 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48180) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXevj-0007Kt-Lf for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:13:58 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXevg-0004aR-Oa for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:13:55 -0500 Received: from mail-pf0-x242.google.com ([2607:f8b0:400e:c00::242]:32976) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXevg-0004Yq-GU for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:13:52 -0500 Received: by mail-pf0-x242.google.com with SMTP id y89so2973471pfk.0 for ; Fri, 05 Jan 2018 19:13:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=hWNeV+zOx+oAfKu9VCxGrhv2jH/fOaNDlWpbf3/FB4Y=; b=Ibqsvx1Ndn98jCzTS/DPvsJlqeoKJsbOyRlzm2ZXF3uey4bfvqYY1vmzcWxkq2AH2c HOgmMqEbs9GrBbMnAct1CnNtwC9KD+BZdA1B54oWEdbazFC7XzoBW7bJaKtFmPew+Qah SPgzxihbOfZH2XCJ9ggovfAgm4PU3g4SD7EvA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=hWNeV+zOx+oAfKu9VCxGrhv2jH/fOaNDlWpbf3/FB4Y=; b=f2iW+y+zP51SF4z52kibxFYTWmLg5vBJLW0+rMiLBgg3LQskwRI4umK+/REaOma60/ sJBwC5asIVihtwIuxim7/MMdWOAP1ijbP9ft4/WOMMAXIqXW4eNUxAylrj1o4NOfEwee EFTxuyWMPEcI/lUNV7MwMZ3yoJzVhJxq/TYm7nmYB1QK9q8PTa4gRaX7PkymfdLLU9YY 5ODENv/1zuPx2dGRSA3qAvTGVC+ENTzuBYtndcpTrrEkYhTnR9NqOaTpczf+VUEVAdun 6AM+u5wETcGTr1uo+f4MieL6qf/CiqA8c0MiQnB34X5NRzZK0Ct0u39tq2OD1o98nb2H cm5w== X-Gm-Message-State: AKGB3mKOaBiRc7iNxerZT0o9D2S320JNfaGkBMzoPUciZKefVb5hzM7y LsYpE/ebUXJnXB/eoEcuJY7PzFzQdpk= X-Received: by 10.99.53.200 with SMTP id c191mr4102499pga.276.1515208431167; Fri, 05 Jan 2018 19:13:51 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.13.49 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:13:50 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:24 -0800 Message-Id: <20180106031346.6650-2-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::242 Subject: [Qemu-devel] [PATCH v8 01/23] tcg: Allow multiple word entries into the constant pool X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" This will be required for storing vector constants. Signed-off-by: Richard Henderson --- tcg/tcg-pool.inc.c | 115 +++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 93 insertions(+), 22 deletions(-) -- 2.14.3 diff --git a/tcg/tcg-pool.inc.c b/tcg/tcg-pool.inc.c index 8a85131405..0f76e7bee3 100644 --- a/tcg/tcg-pool.inc.c +++ b/tcg/tcg-pool.inc.c @@ -22,39 +22,110 @@ typedef struct TCGLabelPoolData { struct TCGLabelPoolData *next; - tcg_target_ulong data; tcg_insn_unit *label; - intptr_t addend; - int type; + int addend : 32; + int rtype : 16; + int nlong : 16; + tcg_target_ulong data[]; } TCGLabelPoolData; -static void new_pool_label(TCGContext *s, tcg_target_ulong data, int type, - tcg_insn_unit *label, intptr_t addend) +static TCGLabelPoolData *new_pool_alloc(TCGContext *s, int nlong, int rtype, + tcg_insn_unit *label, int addend) { - TCGLabelPoolData *n = tcg_malloc(sizeof(*n)); - TCGLabelPoolData *i, **pp; + TCGLabelPoolData *n = tcg_malloc(sizeof(TCGLabelPoolData) + + sizeof(tcg_target_ulong) * nlong); - n->data = data; n->label = label; - n->type = type; n->addend = addend; + n->rtype = rtype; + n->nlong = nlong; + return n; +} + +static void new_pool_insert(TCGContext *s, TCGLabelPoolData *n) +{ + TCGLabelPoolData *i, **pp; + int nlong = n->nlong; /* Insertion sort on the pool. */ - for (pp = &s->pool_labels; (i = *pp) && i->data < data; pp = &i->next) { - continue; + for (pp = &s->pool_labels; (i = *pp) != NULL; pp = &i->next) { + if (nlong > i->nlong) { + break; + } + if (nlong < i->nlong) { + continue; + } + if (memcmp(n->data, i->data, sizeof(tcg_target_ulong) * nlong) >= 0) { + break; + } } n->next = *pp; *pp = n; } +/* The "usual" for generic integer code. */ +static inline void new_pool_label(TCGContext *s, tcg_target_ulong d, int rtype, + tcg_insn_unit *label, int addend) +{ + TCGLabelPoolData *n = new_pool_alloc(s, 1, rtype, label, addend); + n->data[0] = d; + new_pool_insert(s, n); +} + +/* For v64 or v128, depending on the host. */ +static inline void new_pool_l2(TCGContext *s, int rtype, tcg_insn_unit *label, + int addend, tcg_target_ulong d0, + tcg_target_ulong d1) +{ + TCGLabelPoolData *n = new_pool_alloc(s, 2, rtype, label, addend); + n->data[0] = d0; + n->data[1] = d1; + new_pool_insert(s, n); +} + +/* For v128 or v256, depending on the host. */ +static inline void new_pool_l4(TCGContext *s, int rtype, tcg_insn_unit *label, + int addend, tcg_target_ulong d0, + tcg_target_ulong d1, tcg_target_ulong d2, + tcg_target_ulong d3) +{ + TCGLabelPoolData *n = new_pool_alloc(s, 4, rtype, label, addend); + n->data[0] = d0; + n->data[1] = d1; + n->data[2] = d2; + n->data[3] = d3; + new_pool_insert(s, n); +} + +/* For v256, for 32-bit host. */ +static inline void new_pool_l8(TCGContext *s, int rtype, tcg_insn_unit *label, + int addend, tcg_target_ulong d0, + tcg_target_ulong d1, tcg_target_ulong d2, + tcg_target_ulong d3, tcg_target_ulong d4, + tcg_target_ulong d5, tcg_target_ulong d6, + tcg_target_ulong d7) +{ + TCGLabelPoolData *n = new_pool_alloc(s, 8, rtype, label, addend); + n->data[0] = d0; + n->data[1] = d1; + n->data[2] = d2; + n->data[3] = d3; + n->data[4] = d4; + n->data[5] = d5; + n->data[6] = d6; + n->data[7] = d7; + new_pool_insert(s, n); +} + /* To be provided by cpu/tcg-target.inc.c. */ static void tcg_out_nop_fill(tcg_insn_unit *p, int count); static bool tcg_out_pool_finalize(TCGContext *s) { TCGLabelPoolData *p = s->pool_labels; - tcg_target_ulong d, *a; + TCGLabelPoolData *l = NULL; + void *a; if (p == NULL) { return true; @@ -62,24 +133,24 @@ static bool tcg_out_pool_finalize(TCGContext *s) /* ??? Round up to qemu_icache_linesize, but then do not round again when allocating the next TranslationBlock structure. */ - a = (void *)ROUND_UP((uintptr_t)s->code_ptr, sizeof(tcg_target_ulong)); + a = (void *)ROUND_UP((uintptr_t)s->code_ptr, + sizeof(tcg_target_ulong) * p->nlong); tcg_out_nop_fill(s->code_ptr, (tcg_insn_unit *)a - s->code_ptr); s->data_gen_ptr = a; - /* Ensure the first comparison fails. */ - d = p->data + 1; - for (; p != NULL; p = p->next) { - if (p->data != d) { - d = p->data; - if (unlikely((void *)a > s->code_gen_highwater)) { + size_t size = sizeof(tcg_target_ulong) * p->nlong; + if (!l || l->nlong != p->nlong || memcmp(l->data, p->data, size)) { + if (unlikely(a > s->code_gen_highwater)) { return false; } - *a++ = d; + memcpy(a, p->data, size); + a += size; + l = p; } - patch_reloc(p->label, p->type, (intptr_t)(a - 1), p->addend); + patch_reloc(p->label, p->rtype, (intptr_t)a - size, p->addend); } - s->code_ptr = (void *)a; + s->code_ptr = a; return true; } From patchwork Sat Jan 6 03:13:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123604 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp123554qgn; Fri, 5 Jan 2018 19:17:30 -0800 (PST) X-Google-Smtp-Source: ACJfBosubNSg/lt7UBOJ0tfHhxcp037NChm8lLBJwAk2B6QjMdBWCX5+yGD699iFw1eyFN2W03ur X-Received: by 10.37.160.9 with SMTP id x9mr4732250ybh.179.1515208650270; Fri, 05 Jan 2018 19:17:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515208650; cv=none; d=google.com; s=arc-20160816; b=kmWMUiI0LExdKbB/AzcaWlCu6hpaXbXTE+/DTOnG3aP5HOr2jK/48rpttauyviixIF zrqPR/OTLSO2Ngiah/jIkayaJqFa6vil7ezcClvamMMpyBrrBGYgU550esbzTw20Dp+5 +2RGZKM8xIZ/QXZ3DqkElCxsnTxAwneA5Uot8jboeaGFsYeT4qXwfzv9sVDIh7uBwSb6 Bt2Uqj0Ap3CX3UWhSPu+5AnxwlHXuQgOASNrrPhGfA0sfOBRz3n2PDx1tCZACY0p5/nC pOwYDFl4i8D0NAsS/12Pp6twRwMjDgIBBLJNgtQ02FZhpeNDLlZjLV9y/Lk9o/OVjl0N CLmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=3ATsLciIpK6MJtUY82ncrcWUkwDTu7ZiPShPSdCMTh0=; b=Nmi1EPw4g/e1/R3vNwNmgv94n8cLcxqMHWsNMI9/iu4dBmdVHl/1u1m+jSEb4Tfo1/ cjWrAx+7C8oTidPA7IygHkZ+epyP0kEY/bztZW01vyg5Pn1gNP3OnenjOKwTONDMGsju sYtk+Rsqd1AePrDm4otRoQ4yXGx2YMUmPRCke9CIctPGiHETJX06aeZF6A2n3KReLZo3 aSWJbudLvcBiK1uvsUL84kFMjoYlGzpU4jQ/lfayNtCSDo5nOqOpRKloFBwfDpD/92JD 9hnhFV3qfnhaVAwIF+8Zp/E6qpk02l3PVNhgeddXvxcnkJ1HHlqEVqny5mSM+TzhlIjm ggBA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=h7NET1eK; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id n16si1492476ybk.741.2018.01.05.19.17.30 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:17:30 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=h7NET1eK; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44042 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXezB-0001Ow-Jp for patch@linaro.org; Fri, 05 Jan 2018 22:17:29 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48215) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXevm-0007LI-G1 for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:02 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXevi-0004ee-Mc for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:13:58 -0500 Received: from mail-pg0-x242.google.com ([2607:f8b0:400e:c05::242]:40637) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXevi-0004d4-Bn for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:13:54 -0500 Received: by mail-pg0-x242.google.com with SMTP id q12so2711410pgt.7 for ; Fri, 05 Jan 2018 19:13:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=3ATsLciIpK6MJtUY82ncrcWUkwDTu7ZiPShPSdCMTh0=; b=h7NET1eKStscFqtpr7ZJAakjwu0/YrvpmgKeRWFCJOqQWUSGw+vozfhzPqZGZIlic0 lUPWmssXaculV1F1h3LsbCIi5VJ7M1Pef0N9AFMstl+KNQsrUAqs/Ry5cvddlmODWCac lwCI4vuI0IlDveEWoAg8ZlKz3yvzdVn4xDILw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=3ATsLciIpK6MJtUY82ncrcWUkwDTu7ZiPShPSdCMTh0=; b=A1qOW9xxpHv47nIQQnrnXf7e01XTQ+ndN+YZjlZon9YAzxaTqnst6uBQdc1ieyD05a 1bRIESlWWH81jQf9SuoSjWpP0ENjLaqkMZyV3V+1bFAtgK3vX4r0GjICSKE4YGhWgEtm Bv7g2dbpiHJc1nI1V876lmF3RkP8N4e4ALTpToKNgt5ThHOCX/dtv2dricaed9p1yt7q XhXd3z9Mrku5ETRvk97QjHAPNTe7XFzarNkPDxLQi5lexyvawD7KDU/dX4Y0j5lmCFPj toC6Q54p1SRZE7b5sH0X216xaHddQmS0Lx05z0J6eCPF708X6K3TsNzyxuOqMFMNhSuI MNgg== X-Gm-Message-State: AKGB3mKJY94cEPN1ZAxvnbv3meehn3D9JTGXJ7tIIRexhfM0oNszPpb6 7dir+aSM796JpocW0q4sPlVkMEibTYM= X-Received: by 10.101.82.205 with SMTP id z13mr4107774pgp.29.1515208432776; Fri, 05 Jan 2018 19:13:52 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.13.51 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:13:51 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:25 -0800 Message-Id: <20180106031346.6650-3-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::242 Subject: [Qemu-devel] [PATCH v8 02/23] tcg: Add types and basic operations for host vectors X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Nothing uses or enables them yet. Signed-off-by: Richard Henderson --- Makefile.target | 4 +- tcg/tcg-op.h | 30 +++++ tcg/tcg-opc.h | 26 ++++ tcg/tcg.h | 56 +++++++++ tcg/tcg-op-vec.c | 362 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ tcg/tcg.c | 100 ++++++++++++++- tcg/README | 58 +++++++++ 7 files changed, 630 insertions(+), 6 deletions(-) create mode 100644 tcg/tcg-op-vec.c -- 2.14.3 diff --git a/Makefile.target b/Makefile.target index f9a9da7e7c..7f30a1e725 100644 --- a/Makefile.target +++ b/Makefile.target @@ -93,8 +93,8 @@ all: $(PROGS) stap # cpu emulator library obj-y += exec.o obj-y += accel/ -obj-$(CONFIG_TCG) += tcg/tcg.o tcg/tcg-op.o tcg/optimize.o -obj-$(CONFIG_TCG) += tcg/tcg-common.o +obj-$(CONFIG_TCG) += tcg/tcg.o tcg/tcg-op.o tcg/tcg-op-vec.o +obj-$(CONFIG_TCG) += tcg/tcg-common.o tcg/optimize.o obj-$(CONFIG_TCG_INTERPRETER) += tcg/tci.o obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o obj-y += fpu/softfloat.o diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index ca07b32b65..9b0560e4d3 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -35,6 +35,10 @@ void tcg_gen_op4(TCGOpcode, TCGArg, TCGArg, TCGArg, TCGArg); void tcg_gen_op5(TCGOpcode, TCGArg, TCGArg, TCGArg, TCGArg, TCGArg); void tcg_gen_op6(TCGOpcode, TCGArg, TCGArg, TCGArg, TCGArg, TCGArg, TCGArg); +void vec_gen_2(TCGOpcode, TCGType, unsigned, TCGArg, TCGArg); +void vec_gen_3(TCGOpcode, TCGType, unsigned, TCGArg, TCGArg, TCGArg); +void vec_gen_4(TCGOpcode, TCGType, unsigned, TCGArg, TCGArg, TCGArg, TCGArg); + static inline void tcg_gen_op1_i32(TCGOpcode opc, TCGv_i32 a1) { tcg_gen_op1(opc, tcgv_i32_arg(a1)); @@ -903,6 +907,30 @@ void tcg_gen_atomic_or_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp); void tcg_gen_atomic_xor_fetch_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, TCGMemOp); void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp); +void tcg_gen_mov_vec(TCGv_vec, TCGv_vec); +void tcg_gen_dup_i32_vec(unsigned vece, TCGv_vec, TCGv_i32); +void tcg_gen_dup_i64_vec(unsigned vece, TCGv_vec, TCGv_i64); +void tcg_gen_dup8i_vec(TCGv_vec, uint32_t); +void tcg_gen_dup16i_vec(TCGv_vec, uint32_t); +void tcg_gen_dup32i_vec(TCGv_vec, uint32_t); +void tcg_gen_dup64i_vec(TCGv_vec, uint64_t); +void tcg_gen_movi_v64(TCGv_vec, uint64_t); +void tcg_gen_movi_v128(TCGv_vec, uint64_t, uint64_t); +void tcg_gen_movi_v256(TCGv_vec, uint64_t, uint64_t, uint64_t, uint64_t); +void tcg_gen_add_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_sub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_and_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_or_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_xor_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_andc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_orc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a); +void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a); + +void tcg_gen_ld_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset); +void tcg_gen_st_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset); +void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t); + #if TARGET_LONG_BITS == 64 #define tcg_gen_movi_tl tcg_gen_movi_i64 #define tcg_gen_mov_tl tcg_gen_mov_i64 @@ -1001,6 +1029,7 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp); #define tcg_gen_atomic_and_fetch_tl tcg_gen_atomic_and_fetch_i64 #define tcg_gen_atomic_or_fetch_tl tcg_gen_atomic_or_fetch_i64 #define tcg_gen_atomic_xor_fetch_tl tcg_gen_atomic_xor_fetch_i64 +#define tcg_gen_dup_tl_vec tcg_gen_dup_i64_vec #else #define tcg_gen_movi_tl tcg_gen_movi_i32 #define tcg_gen_mov_tl tcg_gen_mov_i32 @@ -1098,6 +1127,7 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp); #define tcg_gen_atomic_and_fetch_tl tcg_gen_atomic_and_fetch_i32 #define tcg_gen_atomic_or_fetch_tl tcg_gen_atomic_or_fetch_i32 #define tcg_gen_atomic_xor_fetch_tl tcg_gen_atomic_xor_fetch_i32 +#define tcg_gen_dup_tl_vec tcg_gen_dup_i32_vec #endif #if UINTPTR_MAX == UINT32_MAX diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index 956fb1e9f3..4e62eda14b 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -204,8 +204,34 @@ DEF(qemu_ld_i64, DATA64_ARGS, TLADDR_ARGS, 1, DEF(qemu_st_i64, 0, TLADDR_ARGS + DATA64_ARGS, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS | TCG_OPF_64BIT) +/* Host vector support. */ + +#define IMPLVEC TCG_OPF_VECTOR | IMPL(TCG_TARGET_MAYBE_vec) + +DEF(mov_vec, 1, 1, 0, TCG_OPF_VECTOR | TCG_OPF_NOT_PRESENT) +DEF(movi_vec, 1, 0, 0, TCG_OPF_VECTOR | TCG_OPF_NOT_PRESENT) /* vecl defines const args */ +DEF(dupi_vec, 1, 0, 1, TCG_OPF_VECTOR | TCG_OPF_NOT_PRESENT) + +DEF(dup_vec, 1, 1, 0, IMPLVEC) +DEF(dup2_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_REG_BITS == 32)) + +DEF(ld_vec, 1, 1, 1, IMPLVEC) +DEF(st_vec, 0, 2, 1, IMPLVEC) + +DEF(add_vec, 1, 2, 0, IMPLVEC) +DEF(sub_vec, 1, 2, 0, IMPLVEC) +DEF(neg_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_neg_vec)) + +DEF(and_vec, 1, 2, 0, IMPLVEC) +DEF(or_vec, 1, 2, 0, IMPLVEC) +DEF(xor_vec, 1, 2, 0, IMPLVEC) +DEF(andc_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_andc_vec)) +DEF(orc_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_orc_vec)) +DEF(not_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_not_vec)) + #undef TLADDR_ARGS #undef DATA64_ARGS #undef IMPL #undef IMPL64 +#undef IMPLVEC #undef DEF diff --git a/tcg/tcg.h b/tcg/tcg.h index 2ce497cebf..dce483b0ee 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -170,6 +170,27 @@ typedef uint64_t TCGRegSet; # error "Missing unsigned widening multiply" #endif +#if !defined(TCG_TARGET_HAS_v64) \ + && !defined(TCG_TARGET_HAS_v128) \ + && !defined(TCG_TARGET_HAS_v256) +#define TCG_TARGET_MAYBE_vec 0 +#define TCG_TARGET_HAS_neg_vec 0 +#define TCG_TARGET_HAS_not_vec 0 +#define TCG_TARGET_HAS_andc_vec 0 +#define TCG_TARGET_HAS_orc_vec 0 +#else +#define TCG_TARGET_MAYBE_vec 1 +#endif +#ifndef TCG_TARGET_HAS_v64 +#define TCG_TARGET_HAS_v64 0 +#endif +#ifndef TCG_TARGET_HAS_v128 +#define TCG_TARGET_HAS_v128 0 +#endif +#ifndef TCG_TARGET_HAS_v256 +#define TCG_TARGET_HAS_v256 0 +#endif + #ifndef TARGET_INSN_START_EXTRA_WORDS # define TARGET_INSN_START_WORDS 1 #else @@ -246,6 +267,11 @@ typedef struct TCGPool { typedef enum TCGType { TCG_TYPE_I32, TCG_TYPE_I64, + + TCG_TYPE_V64, + TCG_TYPE_V128, + TCG_TYPE_V256, + TCG_TYPE_COUNT, /* number of different types */ /* An alias for the size of the host register. */ @@ -396,6 +422,8 @@ typedef tcg_target_ulong TCGArg; * TCGv_i32 : 32 bit integer type * TCGv_i64 : 64 bit integer type * TCGv_ptr : a host pointer type + * TCGv_vec : a host vector type; the exact size is not exposed + to the CPU front-end code. * TCGv : an integer type the same size as target_ulong (an alias for either TCGv_i32 or TCGv_i64) The compiler's type checking will complain if you mix them @@ -418,6 +446,7 @@ typedef tcg_target_ulong TCGArg; typedef struct TCGv_i32_d *TCGv_i32; typedef struct TCGv_i64_d *TCGv_i64; typedef struct TCGv_ptr_d *TCGv_ptr; +typedef struct TCGv_vec_d *TCGv_vec; typedef TCGv_ptr TCGv_env; #if TARGET_LONG_BITS == 32 #define TCGv TCGv_i32 @@ -589,6 +618,9 @@ typedef struct TCGOp { #define TCGOP_CALLI(X) (X)->param1 #define TCGOP_CALLO(X) (X)->param2 +#define TCGOP_VECL(X) (X)->param1 +#define TCGOP_VECE(X) (X)->param2 + /* Make sure operands fit in the bitfields above. */ QEMU_BUILD_BUG_ON(NB_OPS > (1 << 8)); @@ -726,6 +758,11 @@ static inline TCGTemp *tcgv_ptr_temp(TCGv_ptr v) return tcgv_i32_temp((TCGv_i32)v); } +static inline TCGTemp *tcgv_vec_temp(TCGv_vec v) +{ + return tcgv_i32_temp((TCGv_i32)v); +} + static inline TCGArg tcgv_i32_arg(TCGv_i32 v) { return temp_arg(tcgv_i32_temp(v)); @@ -741,6 +778,11 @@ static inline TCGArg tcgv_ptr_arg(TCGv_ptr v) return temp_arg(tcgv_ptr_temp(v)); } +static inline TCGArg tcgv_vec_arg(TCGv_vec v) +{ + return temp_arg(tcgv_vec_temp(v)); +} + static inline TCGv_i32 temp_tcgv_i32(TCGTemp *t) { (void)temp_idx(t); /* trigger embedded assert */ @@ -757,6 +799,11 @@ static inline TCGv_ptr temp_tcgv_ptr(TCGTemp *t) return (TCGv_ptr)temp_tcgv_i32(t); } +static inline TCGv_vec temp_tcgv_vec(TCGTemp *t) +{ + return (TCGv_vec)temp_tcgv_i32(t); +} + #if TCG_TARGET_REG_BITS == 32 static inline TCGv_i32 TCGV_LOW(TCGv_i64 t) { @@ -832,9 +879,12 @@ TCGTemp *tcg_global_mem_new_internal(TCGType, TCGv_ptr, TCGv_i32 tcg_temp_new_internal_i32(int temp_local); TCGv_i64 tcg_temp_new_internal_i64(int temp_local); +TCGv_vec tcg_temp_new_vec(TCGType type); +TCGv_vec tcg_temp_new_vec_matching(TCGv_vec match); void tcg_temp_free_i32(TCGv_i32 arg); void tcg_temp_free_i64(TCGv_i64 arg); +void tcg_temp_free_vec(TCGv_vec arg); static inline TCGv_i32 tcg_global_mem_new_i32(TCGv_ptr reg, intptr_t offset, const char *name) @@ -916,6 +966,8 @@ enum { /* Instruction is optional and not implemented by the host, or insn is generic and should not be implemened by the host. */ TCG_OPF_NOT_PRESENT = 0x10, + /* Instruction operands are vectors. */ + TCG_OPF_VECTOR = 0x20, }; typedef struct TCGOpDef { @@ -981,6 +1033,10 @@ TCGv_i32 tcg_const_i32(int32_t val); TCGv_i64 tcg_const_i64(int64_t val); TCGv_i32 tcg_const_local_i32(int32_t val); TCGv_i64 tcg_const_local_i64(int64_t val); +TCGv_vec tcg_const_zeros_vec(TCGType); +TCGv_vec tcg_const_ones_vec(TCGType); +TCGv_vec tcg_const_zeros_vec_matching(TCGv_vec); +TCGv_vec tcg_const_ones_vec_matching(TCGv_vec); TCGLabel *gen_new_label(void); diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c new file mode 100644 index 0000000000..dc04c11860 --- /dev/null +++ b/tcg/tcg-op-vec.c @@ -0,0 +1,362 @@ +/* + * Tiny Code Generator for QEMU + * + * Copyright (c) 2008 Fabrice Bellard + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "qemu/osdep.h" +#include "qemu-common.h" +#include "cpu.h" +#include "exec/exec-all.h" +#include "tcg.h" +#include "tcg-op.h" +#include "tcg-mo.h" + +/* Reduce the number of ifdefs below. This assumes that all uses of + TCGV_HIGH and TCGV_LOW are properly protected by a conditional that + the compiler can eliminate. */ +#if TCG_TARGET_REG_BITS == 64 +extern TCGv_i32 TCGV_LOW_link_error(TCGv_i64); +extern TCGv_i32 TCGV_HIGH_link_error(TCGv_i64); +#define TCGV_LOW TCGV_LOW_link_error +#define TCGV_HIGH TCGV_HIGH_link_error +#endif + +void vec_gen_2(TCGOpcode opc, TCGType type, unsigned vece, TCGArg r, TCGArg a) +{ + TCGOp *op = tcg_emit_op(opc); + TCGOP_VECL(op) = type - TCG_TYPE_V64; + TCGOP_VECE(op) = vece; + op->args[0] = r; + op->args[1] = a; +} + +void vec_gen_3(TCGOpcode opc, TCGType type, unsigned vece, + TCGArg r, TCGArg a, TCGArg b) +{ + TCGOp *op = tcg_emit_op(opc); + TCGOP_VECL(op) = type - TCG_TYPE_V64; + TCGOP_VECE(op) = vece; + op->args[0] = r; + op->args[1] = a; + op->args[2] = b; +} + +void vec_gen_4(TCGOpcode opc, TCGType type, unsigned vece, + TCGArg r, TCGArg a, TCGArg b, TCGArg c) +{ + TCGOp *op = tcg_emit_op(opc); + TCGOP_VECL(op) = type - TCG_TYPE_V64; + TCGOP_VECE(op) = vece; + op->args[0] = r; + op->args[1] = a; + op->args[2] = b; + op->args[3] = c; +} + +static void vec_gen_op2(TCGOpcode opc, unsigned vece, TCGv_vec r, TCGv_vec a) +{ + TCGTemp *rt = tcgv_vec_temp(r); + TCGTemp *at = tcgv_vec_temp(a); + TCGType type = rt->base_type; + + tcg_debug_assert(at->base_type == type); + vec_gen_2(opc, type, vece, temp_arg(rt), temp_arg(at)); +} + +static void vec_gen_op3(TCGOpcode opc, unsigned vece, + TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + TCGTemp *rt = tcgv_vec_temp(r); + TCGTemp *at = tcgv_vec_temp(a); + TCGTemp *bt = tcgv_vec_temp(b); + TCGType type = rt->base_type; + + tcg_debug_assert(at->base_type == type); + tcg_debug_assert(bt->base_type == type); + vec_gen_3(opc, type, vece, temp_arg(rt), temp_arg(at), temp_arg(bt)); +} + +void tcg_gen_mov_vec(TCGv_vec r, TCGv_vec a) +{ + if (r != a) { + vec_gen_op2(INDEX_op_mov_vec, 0, r, a); + } +} + +#define MO_REG (TCG_TARGET_REG_BITS == 64 ? MO_64 : MO_32) + +static void tcg_gen_dupi_vec(TCGv_vec r, unsigned vece, TCGArg a) +{ + TCGTemp *rt = tcgv_vec_temp(r); + vec_gen_2(INDEX_op_dupi_vec, rt->base_type, vece, temp_arg(rt), a); +} + +TCGv_vec tcg_const_zeros_vec(TCGType type) +{ + TCGv_vec ret = tcg_temp_new_vec(type); + tcg_gen_dupi_vec(ret, MO_REG, 0); + return ret; +} + +TCGv_vec tcg_const_ones_vec(TCGType type) +{ + TCGv_vec ret = tcg_temp_new_vec(type); + tcg_gen_dupi_vec(ret, MO_REG, -1); + return ret; +} + +TCGv_vec tcg_const_zeros_vec_matching(TCGv_vec m) +{ + TCGTemp *t = tcgv_vec_temp(m); + return tcg_const_zeros_vec(t->base_type); +} + +TCGv_vec tcg_const_ones_vec_matching(TCGv_vec m) +{ + TCGTemp *t = tcgv_vec_temp(m); + return tcg_const_ones_vec(t->base_type); +} + +void tcg_gen_dup64i_vec(TCGv_vec r, uint64_t a) +{ + if (TCG_TARGET_REG_BITS == 32 && a == deposit64(a, 32, 32, a)) { + tcg_gen_dupi_vec(r, MO_32, a); + } else if (TCG_TARGET_REG_BITS == 64 || a == (uint64_t)(int32_t)a) { + tcg_gen_dupi_vec(r, MO_64, a); + } else { + TCGv_i64 c = tcg_const_i64(a); + tcg_gen_dup_i64_vec(MO_64, r, c); + tcg_temp_free_i64(c); + } +} + +void tcg_gen_dup32i_vec(TCGv_vec r, uint32_t a) +{ + tcg_gen_dupi_vec(r, MO_REG, ((TCGArg)-1 / 0xffffffffu) * a); +} + +void tcg_gen_dup16i_vec(TCGv_vec r, uint32_t a) +{ + tcg_gen_dupi_vec(r, MO_REG, ((TCGArg)-1 / 0xffff) * (a & 0xffff)); +} + +void tcg_gen_dup8i_vec(TCGv_vec r, uint32_t a) +{ + tcg_gen_dupi_vec(r, MO_REG, ((TCGArg)-1 / 0xff) * (a & 0xff)); +} + +void tcg_gen_movi_v64(TCGv_vec r, uint64_t a) +{ + TCGTemp *rt = tcgv_vec_temp(r); + TCGArg ri = temp_arg(rt); + + tcg_debug_assert(rt->base_type == TCG_TYPE_V64); + if (TCG_TARGET_REG_BITS == 64) { + vec_gen_2(INDEX_op_movi_vec, TCG_TYPE_V64, 0, ri, a); + } else { + vec_gen_3(INDEX_op_movi_vec, TCG_TYPE_V64, 0, ri, a, a >> 32); + } +} + +void tcg_gen_movi_v128(TCGv_vec r, uint64_t a, uint64_t b) +{ + TCGTemp *rt = tcgv_vec_temp(r); + TCGArg ri = temp_arg(rt); + + tcg_debug_assert(rt->base_type == TCG_TYPE_V128); + if (a == b) { + tcg_gen_dup64i_vec(r, a); + } else if (TCG_TARGET_REG_BITS == 64) { + vec_gen_3(INDEX_op_movi_vec, TCG_TYPE_V128, 0, ri, a, b); + } else { + TCGOp *op = tcg_emit_op(INDEX_op_movi_vec); + TCGOP_VECL(op) = TCG_TYPE_V128 - TCG_TYPE_V64; + op->args[0] = ri; + op->args[1] = a; + op->args[2] = a >> 32; + op->args[3] = b; + op->args[4] = b >> 32; + } +} + +void tcg_gen_movi_v256(TCGv_vec r, uint64_t a, uint64_t b, + uint64_t c, uint64_t d) +{ + TCGArg ri = tcgv_vec_arg(r); + TCGTemp *rt = arg_temp(ri); + + tcg_debug_assert(rt->base_type == TCG_TYPE_V256); + if (a == b && a == c && a == d) { + tcg_gen_dup64i_vec(r, a); + } else { + TCGOp *op = tcg_emit_op(INDEX_op_movi_vec); + TCGOP_VECL(op) = TCG_TYPE_V256 - TCG_TYPE_V64; + op->args[0] = ri; + if (TCG_TARGET_REG_BITS == 64) { + op->args[1] = a; + op->args[2] = b; + op->args[3] = c; + op->args[4] = d; + } else { + op->args[1] = a; + op->args[2] = a >> 32; + op->args[3] = b; + op->args[4] = b >> 32; + op->args[5] = c; + op->args[6] = c >> 32; + op->args[7] = d; + op->args[8] = d >> 32; + } + } +} + +void tcg_gen_dup_i64_vec(unsigned vece, TCGv_vec r, TCGv_i64 a) +{ + TCGArg ri = tcgv_vec_arg(r); + TCGTemp *rt = arg_temp(ri); + TCGType type = rt->base_type; + + if (TCG_TARGET_REG_BITS == 64) { + TCGArg ai = tcgv_i64_arg(a); + vec_gen_2(INDEX_op_dup_vec, type, MO_64, ri, ai); + } else if (vece == MO_64) { + TCGArg al = tcgv_i32_arg(TCGV_LOW(a)); + TCGArg ah = tcgv_i32_arg(TCGV_HIGH(a)); + vec_gen_3(INDEX_op_dup2_vec, type, MO_64, ri, al, ah); + } else { + TCGArg ai = tcgv_i32_arg(TCGV_LOW(a)); + vec_gen_2(INDEX_op_dup_vec, type, MO_64, ri, ai); + } +} + +void tcg_gen_dup_i32_vec(unsigned vece, TCGv_vec r, TCGv_i32 a) +{ + TCGArg ri = tcgv_vec_arg(r); + TCGArg ai = tcgv_i32_arg(a); + TCGTemp *rt = arg_temp(ri); + TCGType type = rt->base_type; + + vec_gen_2(INDEX_op_dup_vec, type, vece, ri, ai); +} + +static void vec_gen_ldst(TCGOpcode opc, TCGv_vec r, TCGv_ptr b, TCGArg o) +{ + TCGArg ri = tcgv_vec_arg(r); + TCGArg bi = tcgv_ptr_arg(b); + TCGTemp *rt = arg_temp(ri); + TCGType type = rt->base_type; + + vec_gen_3(opc, type, 0, ri, bi, o); +} + +void tcg_gen_ld_vec(TCGv_vec r, TCGv_ptr b, TCGArg o) +{ + vec_gen_ldst(INDEX_op_ld_vec, r, b, o); +} + +void tcg_gen_st_vec(TCGv_vec r, TCGv_ptr b, TCGArg o) +{ + vec_gen_ldst(INDEX_op_st_vec, r, b, o); +} + +void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr b, TCGArg o, TCGType low_type) +{ + TCGArg ri = tcgv_vec_arg(r); + TCGArg bi = tcgv_ptr_arg(b); + TCGTemp *rt = arg_temp(ri); + TCGType type = rt->base_type; + + tcg_debug_assert(low_type >= TCG_TYPE_V64); + tcg_debug_assert(low_type <= type); + vec_gen_3(INDEX_op_st_vec, low_type, 0, ri, bi, o); +} + +void tcg_gen_add_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + vec_gen_op3(INDEX_op_add_vec, vece, r, a, b); +} + +void tcg_gen_sub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + vec_gen_op3(INDEX_op_sub_vec, vece, r, a, b); +} + +void tcg_gen_and_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + vec_gen_op3(INDEX_op_and_vec, 0, r, a, b); +} + +void tcg_gen_or_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + vec_gen_op3(INDEX_op_or_vec, 0, r, a, b); +} + +void tcg_gen_xor_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + vec_gen_op3(INDEX_op_xor_vec, 0, r, a, b); +} + +void tcg_gen_andc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + if (TCG_TARGET_HAS_andc_vec) { + vec_gen_op3(INDEX_op_andc_vec, 0, r, a, b); + } else { + TCGv_vec t = tcg_temp_new_vec_matching(r); + tcg_gen_not_vec(0, t, b); + tcg_gen_and_vec(0, r, a, t); + tcg_temp_free_vec(t); + } +} + +void tcg_gen_orc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + if (TCG_TARGET_HAS_orc_vec) { + vec_gen_op3(INDEX_op_orc_vec, 0, r, a, b); + } else { + TCGv_vec t = tcg_temp_new_vec_matching(r); + tcg_gen_not_vec(0, t, b); + tcg_gen_or_vec(0, r, a, t); + tcg_temp_free_vec(t); + } +} + +void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a) +{ + if (TCG_TARGET_HAS_not_vec) { + vec_gen_op2(INDEX_op_not_vec, 0, r, a); + } else { + TCGv_vec t = tcg_const_ones_vec_matching(r); + tcg_gen_xor_vec(0, r, a, t); + tcg_temp_free_vec(t); + } +} + +void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a) +{ + if (TCG_TARGET_HAS_neg_vec) { + vec_gen_op2(INDEX_op_neg_vec, vece, r, a); + } else { + TCGv_vec t = tcg_const_zeros_vec_matching(r); + tcg_gen_sub_vec(vece, r, t, a); + tcg_temp_free_vec(t); + } +} diff --git a/tcg/tcg.c b/tcg/tcg.c index 93caa0be93..16b8faf66f 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -106,6 +106,18 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret, tcg_target_long arg); static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args, const int *const_args); +#if TCG_TARGET_MAYBE_vec +static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, unsigned vecl, + unsigned vece, const TCGArg *args, + const int *const_args); +#else +static inline void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, unsigned vecl, + unsigned vece, const TCGArg *args, + const int *const_args) +{ + g_assert_not_reached(); +} +#endif static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, TCGReg arg1, intptr_t arg2); static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val, @@ -146,8 +158,7 @@ struct tcg_region_state { }; static struct tcg_region_state region; - -static TCGRegSet tcg_target_available_regs[2]; +static TCGRegSet tcg_target_available_regs[TCG_TYPE_COUNT]; static TCGRegSet tcg_target_call_clobber_regs; #if TCG_TARGET_INSN_UNIT_SIZE == 1 @@ -1026,6 +1037,41 @@ TCGv_i64 tcg_temp_new_internal_i64(int temp_local) return temp_tcgv_i64(t); } +TCGv_vec tcg_temp_new_vec(TCGType type) +{ + TCGTemp *t; + +#ifdef CONFIG_DEBUG_TCG + switch (type) { + case TCG_TYPE_V64: + assert(TCG_TARGET_HAS_v64); + break; + case TCG_TYPE_V128: + assert(TCG_TARGET_HAS_v128); + break; + case TCG_TYPE_V256: + assert(TCG_TARGET_HAS_v256); + break; + default: + g_assert_not_reached(); + } +#endif + + t = tcg_temp_new_internal(type, 0); + return temp_tcgv_vec(t); +} + +/* Create a new temp of the same type as an existing temp. */ +TCGv_vec tcg_temp_new_vec_matching(TCGv_vec match) +{ + TCGTemp *t = tcgv_vec_temp(match); + + tcg_debug_assert(t->temp_allocated != 0); + + t = tcg_temp_new_internal(t->base_type, 0); + return temp_tcgv_vec(t); +} + static void tcg_temp_free_internal(TCGTemp *ts) { TCGContext *s = tcg_ctx; @@ -1057,6 +1103,11 @@ void tcg_temp_free_i64(TCGv_i64 arg) tcg_temp_free_internal(tcgv_i64_temp(arg)); } +void tcg_temp_free_vec(TCGv_vec arg) +{ + tcg_temp_free_internal(tcgv_vec_temp(arg)); +} + TCGv_i32 tcg_const_i32(int32_t val) { TCGv_i32 t0; @@ -1114,6 +1165,9 @@ int tcg_check_temp_count(void) Test the runtime variable that controls each opcode. */ bool tcg_op_supported(TCGOpcode op) { + const bool have_vec + = TCG_TARGET_HAS_v64 | TCG_TARGET_HAS_v128 | TCG_TARGET_HAS_v256; + switch (op) { case INDEX_op_discard: case INDEX_op_set_label: @@ -1327,6 +1381,29 @@ bool tcg_op_supported(TCGOpcode op) case INDEX_op_mulsh_i64: return TCG_TARGET_HAS_mulsh_i64; + case INDEX_op_mov_vec: + case INDEX_op_movi_vec: + case INDEX_op_dup_vec: + case INDEX_op_dupi_vec: + case INDEX_op_ld_vec: + case INDEX_op_st_vec: + case INDEX_op_add_vec: + case INDEX_op_sub_vec: + case INDEX_op_and_vec: + case INDEX_op_or_vec: + case INDEX_op_xor_vec: + return have_vec; + case INDEX_op_dup2_vec: + return have_vec && TCG_TARGET_REG_BITS == 32; + case INDEX_op_not_vec: + return have_vec && TCG_TARGET_HAS_not_vec; + case INDEX_op_neg_vec: + return have_vec && TCG_TARGET_HAS_neg_vec; + case INDEX_op_andc_vec: + return have_vec && TCG_TARGET_HAS_andc_vec; + case INDEX_op_orc_vec: + return have_vec && TCG_TARGET_HAS_orc_vec; + case NB_OPS: break; } @@ -1661,6 +1738,14 @@ void tcg_dump_ops(TCGContext *s) nb_iargs = def->nb_iargs; nb_cargs = def->nb_cargs; + if (c == INDEX_op_movi_vec) { + nb_cargs = (64 / TCG_TARGET_REG_BITS) << TCGOP_VECL(op); + } + if (def->flags & TCG_OPF_VECTOR) { + col += qemu_log("v%d,e%d,", 64 << TCGOP_VECL(op), + 8 << TCGOP_VECE(op)); + } + k = 0; for (i = 0; i < nb_oargs; i++) { if (k != 0) { @@ -2890,8 +2975,13 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op) } /* emit instruction */ - tcg_out_op(s, op->opc, new_args, const_args); - + if (def->flags & TCG_OPF_VECTOR) { + tcg_out_vec_op(s, op->opc, TCGOP_VECL(op), TCGOP_VECE(op), + new_args, const_args); + } else { + tcg_out_op(s, op->opc, new_args, const_args); + } + /* move the outputs in the correct register if needed */ for(i = 0; i < nb_oargs; i++) { ts = arg_temp(op->args[i]); @@ -3239,10 +3329,12 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb) switch (opc) { case INDEX_op_mov_i32: case INDEX_op_mov_i64: + case INDEX_op_mov_vec: tcg_reg_alloc_mov(s, op); break; case INDEX_op_movi_i32: case INDEX_op_movi_i64: + case INDEX_op_dupi_vec: tcg_reg_alloc_movi(s, op); break; case INDEX_op_insn_start: diff --git a/tcg/README b/tcg/README index 03bfb6acd4..e14990fb9b 100644 --- a/tcg/README +++ b/tcg/README @@ -503,6 +503,64 @@ of the memory access. For a 32-bit host, qemu_ld/st_i64 is guaranteed to only be used with a 64-bit memory access specified in flags. +********* Host vector operations + +All of the vector ops have two parameters, TCGOP_VECL & TCGOP_VECE. +The former specifies the length of the vector in log2 64-bit units; the +later specifies the length of the element (if applicable) in log2 8-bit units. +E.g. VECL=1 -> 64 << 1 -> v128, and VECE=2 -> 1 << 2 -> i32. + +* mov_vec v0, v1 +* ld_vec v0, t1 +* st_vec v0, t1 + + Move, load and store. + +* movi_vec v0, a, b, ... + + Move with constant data. There are arguments to hold the entire + vector value, stored little-endian. Note that the way MAX_OPC_PARAM + is sized for 64- and 32-bit hosts, there are enough slots for v256, + but not a future v512. + + Prefer dupi_vec when possible. + +* dup_vec v0, r1 + + Duplicate the low N bits of R1 into VECL/VECE copies across V0. + +* dupi_vec v0, c + + Similarly, for a constant. + Smaller values will be replicated to host register size by the expanders. + +* dup2_vec v0, r1, r2 + + Duplicate r2:r1 into VECL/64 copies across V0. This opcode is + only present for 32-bit hosts. + +* add_vec v0, v1, v2 + + v0 = v1 + v2, in elements across the vector. + +* sub_vec v0, v1, v2 + + Similarly, v0 = v1 - v2. + +* neg_vec v0, v1 + + Similarly, v0 = -v1. + +* and_vec v0, v1, v2 +* or_vec v0, v1, v2 +* xor_vec v0, v1, v2 +* andc_vec v0, v1, v2 +* orc_vec v0, v1, v2 +* not_vec v0, v1 + + Similarly, logical operations with and without compliment. + Note that VECE is unused. + ********* Note 1: Some shortcuts are defined when the last operand is known to be From patchwork Sat Jan 6 03:13:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123603 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp123495qgn; Fri, 5 Jan 2018 19:17:24 -0800 (PST) X-Google-Smtp-Source: ACJfBotAXMfkQ9RFNiYuQ4Yr3YZrc7yNRN+x6SWOkNVFXPwSiaGPS8DpgOIzusW5gIoor2KAPyhj X-Received: by 10.37.68.136 with SMTP id r130mr4743609yba.3.1515208644037; Fri, 05 Jan 2018 19:17:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515208644; cv=none; d=google.com; s=arc-20160816; b=U+NRVg2UdLPScsii3tTjFjEp2Euvi3nf84J72FddOO2AQmyRTP6N4pIN1nadvJ5zYT M/osy7HbIgGonv1+QlU4oSpZHjoe16XnlPuT4PDtRGCoyVbdwpU6u6/uzFQsWA0SH3UT 3yhbh4F44+7Imp3WqePgMEKxVPi1oIa0HHe0RjfQh5L4sx+giw1zfZWVpbhnmATc6H7N ZfwCKHzhOrkALW2KDwoTb0c36nOZJ7OY2u2Bjge8xGDHhQ9rX3S3RY5EbL5fFpfZo7pC +cueBeCMyy81WhPRCdQ+rotRxCyCTygDGDScX7jWe11i5r4AgwTxDlRhCHX0+kRw6mTf 68Wg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=RK3FS0bcnCLkz74Jyk8RSukA5u5ipauxagbXSJKNJFY=; b=ztczr0/bZYzcCkA8NgIh1NsoEKcHu31fpzd751Jt8rtRMW8UXSp+eRR24hRJX5sBoF 8/dqFhPGzkrCM66s4jprSGwysFgBnetQtmEqgxzjI/NdbUFlfeiEe51bJJdnG6vYYKpI fYnjAt6E4qKCP1Ein85lrDGASVPhbdHvSkL+aL5Vk41TEaTCmTFH4XmpVciexS9Lh7ek KK8tuUYp+ihybZZ9LGqAjLUf+GM5PYS7kDtKWXh/2483RVZ/a6lOs+iUA40fSVrPWLr8 Qx/wtSfOCgKq/nCiUIDFdsZg4ugxjT55pRXBaWkK8BrPEsNzbbXot+5U0Ohjl+5IzZnO UqGA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=aooeaHn4; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id m25si1466617ywh.91.2018.01.05.19.17.23 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:17:24 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=aooeaHn4; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44039 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXez5-0001JP-GB for patch@linaro.org; Fri, 05 Jan 2018 22:17:23 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48202) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXevl-0007L0-Kq for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:00 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXevk-0004i1-3P for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:13:57 -0500 Received: from mail-pg0-x244.google.com ([2607:f8b0:400e:c05::244]:46438) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXevj-0004g6-Rz for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:13:56 -0500 Received: by mail-pg0-x244.google.com with SMTP id r2so2708151pgq.13 for ; Fri, 05 Jan 2018 19:13:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=RK3FS0bcnCLkz74Jyk8RSukA5u5ipauxagbXSJKNJFY=; b=aooeaHn4lBtaVJUDr4Ds/rr1W4ekFY2mFGjJYNW+cvV2yAjqQkrRnOkbqBBmG80rHx UM2GlRTc8R2RceEhzRLy01aC12xtyeagMAf1RivVxGe/zD3NkCoZbHMTiDbkGdXmQ93Q 1iy97UiEJ+A5tejgrV4z3rATLz0kgnXcVcQpY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=RK3FS0bcnCLkz74Jyk8RSukA5u5ipauxagbXSJKNJFY=; b=U4M3O1J4FMr3BJPDdjaJllRornHEUdGBk/0OIanFwLkVajd7YRjaQQREFMYmi6qsFa xFNda9Q2Dp5SzcyLQoU7Wi+J73/+bJ/GOIPxgB5EByPs/DL7dqCy+27rCW2n7n5732px 85pkcQak3V2y2JhoDKu9Oxj7S8y3GgFJ8gtvLrURFtyO8TYnX+neuuJf7iUgMi4EaPiq AI3JzYVJiMowqQbUQdTdja5O3CJPXxReWsRvCmmXTVPKQg1r+P+tuQWLU+x3TcYuCd56 l6Rn8UltAsO5DMUeLOBDt+K/6wraexCTkyeflU66HeKY9R6bVfCO455xfG9/FhnE0zi8 StLA== X-Gm-Message-State: AKGB3mJleQeKoumojjuw4dXT+asik4CyERqh/7ZXXcHAGh+YW9rBWMzN lfy6etfS+PyB9EiyOwljJVsKPJujpI4= X-Received: by 10.99.178.78 with SMTP id t14mr3997971pgo.296.1515208434589; Fri, 05 Jan 2018 19:13:54 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.13.52 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:13:53 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:26 -0800 Message-Id: <20180106031346.6650-4-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::244 Subject: [Qemu-devel] [PATCH v8 03/23] tcg: Standardize integral arguments to expanders X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Some functions use intN_t arguments, some use uintN_t, some just used "unsigned". To aid putting function pointers in tables, we need consistency. Signed-off-by: Richard Henderson --- tcg/tcg-op.h | 16 ++++++++-------- tcg/tcg-op.c | 42 +++++++++++++++++++++--------------------- 2 files changed, 29 insertions(+), 29 deletions(-) -- 2.14.3 diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index 9b0560e4d3..df2eabaa67 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -269,12 +269,12 @@ void tcg_gen_mb(TCGBar); void tcg_gen_addi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2); void tcg_gen_subfi_i32(TCGv_i32 ret, int32_t arg1, TCGv_i32 arg2); void tcg_gen_subi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2); -void tcg_gen_andi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2); +void tcg_gen_andi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2); void tcg_gen_ori_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2); void tcg_gen_xori_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2); -void tcg_gen_shli_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2); -void tcg_gen_shri_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2); -void tcg_gen_sari_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2); +void tcg_gen_shli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2); +void tcg_gen_shri_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2); +void tcg_gen_sari_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2); void tcg_gen_muli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2); void tcg_gen_div_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2); void tcg_gen_rem_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2); @@ -458,12 +458,12 @@ static inline void tcg_gen_not_i32(TCGv_i32 ret, TCGv_i32 arg) void tcg_gen_addi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2); void tcg_gen_subfi_i64(TCGv_i64 ret, int64_t arg1, TCGv_i64 arg2); void tcg_gen_subi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2); -void tcg_gen_andi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2); +void tcg_gen_andi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2); void tcg_gen_ori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2); void tcg_gen_xori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2); -void tcg_gen_shli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2); -void tcg_gen_shri_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2); -void tcg_gen_sari_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2); +void tcg_gen_shli_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2); +void tcg_gen_shri_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2); +void tcg_gen_sari_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2); void tcg_gen_muli_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2); void tcg_gen_div_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2); void tcg_gen_rem_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2); diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c index 0c509bfe46..3467787323 100644 --- a/tcg/tcg-op.c +++ b/tcg/tcg-op.c @@ -140,7 +140,7 @@ void tcg_gen_subi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2) } } -void tcg_gen_andi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2) +void tcg_gen_andi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2) { TCGv_i32 t0; /* Some cases can be optimized here. */ @@ -148,17 +148,17 @@ void tcg_gen_andi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2) case 0: tcg_gen_movi_i32(ret, 0); return; - case 0xffffffffu: + case -1: tcg_gen_mov_i32(ret, arg1); return; - case 0xffu: + case 0xff: /* Don't recurse with tcg_gen_ext8u_i32. */ if (TCG_TARGET_HAS_ext8u_i32) { tcg_gen_op2_i32(INDEX_op_ext8u_i32, ret, arg1); return; } break; - case 0xffffu: + case 0xffff: if (TCG_TARGET_HAS_ext16u_i32) { tcg_gen_op2_i32(INDEX_op_ext16u_i32, ret, arg1); return; @@ -199,9 +199,9 @@ void tcg_gen_xori_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2) } } -void tcg_gen_shli_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2) +void tcg_gen_shli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2) { - tcg_debug_assert(arg2 < 32); + tcg_debug_assert(arg2 >= 0 && arg2 < 32); if (arg2 == 0) { tcg_gen_mov_i32(ret, arg1); } else { @@ -211,9 +211,9 @@ void tcg_gen_shli_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2) } } -void tcg_gen_shri_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2) +void tcg_gen_shri_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2) { - tcg_debug_assert(arg2 < 32); + tcg_debug_assert(arg2 >= 0 && arg2 < 32); if (arg2 == 0) { tcg_gen_mov_i32(ret, arg1); } else { @@ -223,9 +223,9 @@ void tcg_gen_shri_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2) } } -void tcg_gen_sari_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2) +void tcg_gen_sari_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2) { - tcg_debug_assert(arg2 < 32); + tcg_debug_assert(arg2 >= 0 && arg2 < 32); if (arg2 == 0) { tcg_gen_mov_i32(ret, arg1); } else { @@ -1201,7 +1201,7 @@ void tcg_gen_subi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2) } } -void tcg_gen_andi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2) +void tcg_gen_andi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2) { TCGv_i64 t0; @@ -1216,23 +1216,23 @@ void tcg_gen_andi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2) case 0: tcg_gen_movi_i64(ret, 0); return; - case 0xffffffffffffffffull: + case -1: tcg_gen_mov_i64(ret, arg1); return; - case 0xffull: + case 0xff: /* Don't recurse with tcg_gen_ext8u_i64. */ if (TCG_TARGET_HAS_ext8u_i64) { tcg_gen_op2_i64(INDEX_op_ext8u_i64, ret, arg1); return; } break; - case 0xffffu: + case 0xffff: if (TCG_TARGET_HAS_ext16u_i64) { tcg_gen_op2_i64(INDEX_op_ext16u_i64, ret, arg1); return; } break; - case 0xffffffffull: + case 0xffffffffu: if (TCG_TARGET_HAS_ext32u_i64) { tcg_gen_op2_i64(INDEX_op_ext32u_i64, ret, arg1); return; @@ -1332,9 +1332,9 @@ static inline void tcg_gen_shifti_i64(TCGv_i64 ret, TCGv_i64 arg1, } } -void tcg_gen_shli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2) +void tcg_gen_shli_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2) { - tcg_debug_assert(arg2 < 64); + tcg_debug_assert(arg2 >= 0 && arg2 < 64); if (TCG_TARGET_REG_BITS == 32) { tcg_gen_shifti_i64(ret, arg1, arg2, 0, 0); } else if (arg2 == 0) { @@ -1346,9 +1346,9 @@ void tcg_gen_shli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2) } } -void tcg_gen_shri_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2) +void tcg_gen_shri_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2) { - tcg_debug_assert(arg2 < 64); + tcg_debug_assert(arg2 >= 0 && arg2 < 64); if (TCG_TARGET_REG_BITS == 32) { tcg_gen_shifti_i64(ret, arg1, arg2, 1, 0); } else if (arg2 == 0) { @@ -1360,9 +1360,9 @@ void tcg_gen_shri_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2) } } -void tcg_gen_sari_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2) +void tcg_gen_sari_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2) { - tcg_debug_assert(arg2 < 64); + tcg_debug_assert(arg2 >= 0 && arg2 < 64); if (TCG_TARGET_REG_BITS == 32) { tcg_gen_shifti_i64(ret, arg1, arg2, 1, 1); } else if (arg2 == 0) { From patchwork Sat Jan 6 03:13:27 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123607 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp125559qgn; Fri, 5 Jan 2018 19:20:27 -0800 (PST) X-Google-Smtp-Source: ACJfBosbUm10OEJBBQqHY6GsUk4q6k+mVTa/mjqNHsTmtINkeXFUYLJ2e9djt2njWv6g/7H6iPno X-Received: by 10.129.199.3 with SMTP id m3mr4737867ywi.334.1515208827177; Fri, 05 Jan 2018 19:20:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515208827; cv=none; d=google.com; s=arc-20160816; b=HK4FErqAIZndO8Beo6i2SV80RG5rJa2G0xxPFD0cq2A2hBE+rU6Nv+GBxogKgktpiC eqKsiTRU1mxLc5JH7atYF8Vc86VD8WKs5NUI1D3RmLgGNGSkxl2Qqp7vDMqYgbmNYidU 1SqScjTUCcZ1t0EhzWnSJa2H7GpI2ogP6mY8r7cc616/yL+TJQ53i1HizyMmzdMY5DoN DI3Lea1iM3SM+E01/rEANLFxAfzJqF3j5Aab8Fd2qn1FXTwvWnIgJkqxOzM8IG3IEsfJ ZPA65T6y+S0G24LhGe8bc4EdVyT7JDbUL563m1cmEFWPUodcmIF5lbRALiJtVFxnC4jd p0Hg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=pQkYQBDAQyxZ33ZUqpsmWtmSxg+9ZveMWSdMTMuRwLs=; b=DGW7VSFUIHW/hgCTgYGAUR9zFvqqJLa7cqWycpjof+SWZNVf03vkKlR3qSCdp/jDSJ 5gDsXtyBMkN0rp14rzdltfU0MQO5LSEn/8weo+oJIJWBOpNAvxw2VtvumGNlQir0yZNC Npn9CACN3oGDxNOlOQ75cetFyNhJv+67BaMOK/Sbgjm784vepRhedv4EOcCgnCNbMPOQ F/e+aV+BO9C1p8NDCt4ULK8YLEYmTyzFM3/PBV8OB5kx5kVCpgBYxhrks6h01a+rFhQJ Pyg05asrSOusoeAWxi86NYzhlbMtOKOcVu/BFVd3xlm56j3lAEgAhFE5ySR2MXTsO9uK Gk+w== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=dWp5cD31; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id c14si1520607ybm.107.2018.01.05.19.20.26 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:20:27 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=dWp5cD31; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44054 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXf22-0003pq-GB for patch@linaro.org; Fri, 05 Jan 2018 22:20:26 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48294) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXevv-0007V4-RK for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:15 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXevn-0004pt-Sq for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:07 -0500 Received: from mail-pg0-x236.google.com ([2607:f8b0:400e:c05::236]:44055) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXevn-0004nq-Fc for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:13:59 -0500 Received: by mail-pg0-x236.google.com with SMTP id i5so2710287pgq.11 for ; Fri, 05 Jan 2018 19:13:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=pQkYQBDAQyxZ33ZUqpsmWtmSxg+9ZveMWSdMTMuRwLs=; b=dWp5cD31vslYP4ez8bjdr/W18kp9hKgsvLGR3x4b7bqcL3T1WcU78P1tQ+JtDeYk6Q Ma0OVwLG+SDZ58MD2VOo/geljFpW/UBO7NU8Ue0FzUPGSp46kb+uHtqxfJQR2TiwuXy2 DwNFqz7RLPmAioTW0hJoU1fVYItASgKSrB/4o= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=pQkYQBDAQyxZ33ZUqpsmWtmSxg+9ZveMWSdMTMuRwLs=; b=EXGPGeWueICoGqRo9cpSG795+i8Qrqgf33HU8OtvmpKLKVTSnpKaDPZ2CT2G/bRI2U XUi/bL2xMht18xd+CHaH165GWrr0p/V7dPtYlAB6ogzXxPK3Hce6ClV1DZWgXE0oTU01 cyBxQ9dunBK+XzQ5mGmM+Afi6DJ5tkmzycuEXh3UoSuPJIkivHanrYAqf+U+25gMVIo8 CUhZAkl8ZrcUK19LncAVyU5Y9Efv1UjPVlJWngGkYbNiWr6cyVlApfu2l1qI15VhzMNy x6A9sT5KmiB3ev+nf6mOgNByd9iM1zewb43CVdR3R1fn9GOL1/wYGIZkrAw0enbouvQb Ey5A== X-Gm-Message-State: AKGB3mKMTgDmafKcZYi6ydvhhmNhLMerws1nprq9mer5AXpz7oEs3uvv ij4GDO+HEBvRY0sKJ/L4h3qt86/png4= X-Received: by 10.98.211.130 with SMTP id z2mr4654150pfk.85.1515208436204; Fri, 05 Jan 2018 19:13:56 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.13.54 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:13:55 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:27 -0800 Message-Id: <20180106031346.6650-5-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::236 Subject: [Qemu-devel] [PATCH v8 04/23] tcg: Add generic vector expanders X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- Makefile.target | 2 +- accel/tcg/tcg-runtime.h | 29 + tcg/tcg-gvec-desc.h | 49 ++ tcg/tcg-op-gvec.h | 194 ++++++ tcg/tcg-op.h | 1 + tcg/tcg-opc.h | 6 + tcg/tcg.h | 15 + accel/tcg/tcg-runtime-gvec.c | 295 +++++++++ tcg/tcg-op-gvec.c | 1403 ++++++++++++++++++++++++++++++++++++++++++ tcg/tcg-op-vec.c | 36 +- tcg/tcg.c | 13 +- accel/tcg/Makefile.objs | 2 +- 12 files changed, 2032 insertions(+), 13 deletions(-) create mode 100644 tcg/tcg-gvec-desc.h create mode 100644 tcg/tcg-op-gvec.h create mode 100644 accel/tcg/tcg-runtime-gvec.c create mode 100644 tcg/tcg-op-gvec.c -- 2.14.3 diff --git a/Makefile.target b/Makefile.target index 7f30a1e725..6549481096 100644 --- a/Makefile.target +++ b/Makefile.target @@ -93,7 +93,7 @@ all: $(PROGS) stap # cpu emulator library obj-y += exec.o obj-y += accel/ -obj-$(CONFIG_TCG) += tcg/tcg.o tcg/tcg-op.o tcg/tcg-op-vec.o +obj-$(CONFIG_TCG) += tcg/tcg.o tcg/tcg-op.o tcg/tcg-op-vec.o tcg/tcg-op-gvec.o obj-$(CONFIG_TCG) += tcg/tcg-common.o tcg/optimize.o obj-$(CONFIG_TCG_INTERPRETER) += tcg/tci.o obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h index 1df17d0ba9..76ee41ce58 100644 --- a/accel/tcg/tcg-runtime.h +++ b/accel/tcg/tcg-runtime.h @@ -134,3 +134,32 @@ GEN_ATOMIC_HELPERS(xor_fetch) GEN_ATOMIC_HELPERS(xchg) #undef GEN_ATOMIC_HELPERS + +DEF_HELPER_FLAGS_3(gvec_mov, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(gvec_dup8, TCG_CALL_NO_RWG, void, ptr, i32, i32) +DEF_HELPER_FLAGS_3(gvec_dup16, TCG_CALL_NO_RWG, void, ptr, i32, i32) +DEF_HELPER_FLAGS_3(gvec_dup32, TCG_CALL_NO_RWG, void, ptr, i32, i32) +DEF_HELPER_FLAGS_3(gvec_dup64, TCG_CALL_NO_RWG, void, ptr, i32, i64) + +DEF_HELPER_FLAGS_4(gvec_add8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_add16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_add32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_add64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_sub8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sub16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sub32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sub64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(gvec_neg8, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_neg16, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_neg32, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_neg64, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(gvec_not, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_and, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_or, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_xor, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_andc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_orc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/tcg/tcg-gvec-desc.h b/tcg/tcg-gvec-desc.h new file mode 100644 index 0000000000..8ba9a8168d --- /dev/null +++ b/tcg/tcg-gvec-desc.h @@ -0,0 +1,49 @@ +/* + * Generic vector operation descriptor + * + * Copyright (c) 2017 Linaro + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see . + */ + +/* ??? These bit widths are set for ARM SVE, maxing out at 256 byte vectors. */ +#define SIMD_OPRSZ_SHIFT 0 +#define SIMD_OPRSZ_BITS 5 + +#define SIMD_MAXSZ_SHIFT (SIMD_OPRSZ_SHIFT + SIMD_OPRSZ_BITS) +#define SIMD_MAXSZ_BITS 5 + +#define SIMD_DATA_SHIFT (SIMD_MAXSZ_SHIFT + SIMD_MAXSZ_BITS) +#define SIMD_DATA_BITS (32 - SIMD_DATA_SHIFT) + +/* Create a descriptor from components. */ +uint32_t simd_desc(uint32_t oprsz, uint32_t maxsz, int32_t data); + +/* Extract the operation size from a descriptor. */ +static inline intptr_t simd_oprsz(uint32_t desc) +{ + return (extract32(desc, SIMD_OPRSZ_SHIFT, SIMD_OPRSZ_BITS) + 1) * 8; +} + +/* Extract the max vector size from a descriptor. */ +static inline intptr_t simd_maxsz(uint32_t desc) +{ + return (extract32(desc, SIMD_MAXSZ_SHIFT, SIMD_MAXSZ_BITS) + 1) * 8; +} + +/* Extract the operation-specific data from a descriptor. */ +static inline int32_t simd_data(uint32_t desc) +{ + return sextract32(desc, SIMD_DATA_SHIFT, SIMD_DATA_BITS); +} diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h new file mode 100644 index 0000000000..9d795fbb30 --- /dev/null +++ b/tcg/tcg-op-gvec.h @@ -0,0 +1,194 @@ +/* + * Generic vector operation expansion + * + * Copyright (c) 2017 Linaro + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see . + */ + +/* + * "Generic" vectors. All operands are given as offsets from ENV, + * and therefore cannot also be allocated via tcg_global_mem_new_*. + * OPRSZ is the byte size of the vector upon which the operation is performed. + * MAXSZ is the byte size of the full vector; bytes beyond OPSZ are cleared. + * + * All sizes must be 8 or any multiple of 16. + * When OPRSZ is 8, the alignment may be 8, otherwise must be 16. + * Operands may completely, but not partially, overlap. + */ + +/* Expand a call to a gvec-style helper, with pointers to two vector + operands, and a descriptor (see tcg-gvec-desc.h). */ +typedef void gen_helper_gvec_2(TCGv_ptr, TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_2_ool(uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, int32_t data, + gen_helper_gvec_2 *fn); + +/* Similarly, passing an extra pointer (e.g. env or float_status). */ +typedef void gen_helper_gvec_2_ptr(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_2_ptr(uint32_t dofs, uint32_t aofs, + TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz, + int32_t data, gen_helper_gvec_2_ptr *fn); + +/* Similarly, with three vector operands. */ +typedef void gen_helper_gvec_3(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_3_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t oprsz, uint32_t maxsz, int32_t data, + gen_helper_gvec_3 *fn); + +/* Similarly, with four vector operands. */ +typedef void gen_helper_gvec_4(TCGv_ptr, TCGv_ptr, TCGv_ptr, + TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_4_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, uint32_t oprsz, uint32_t maxsz, + int32_t data, gen_helper_gvec_4 *fn); + +/* Similarly, with five vector operands. */ +typedef void gen_helper_gvec_5(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr, + TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_5_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, uint32_t xofs, uint32_t oprsz, + uint32_t maxsz, int32_t data, gen_helper_gvec_5 *fn); + +typedef void gen_helper_gvec_3_ptr(TCGv_ptr, TCGv_ptr, TCGv_ptr, + TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_3_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs, + TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz, + int32_t data, gen_helper_gvec_3_ptr *fn); + +typedef void gen_helper_gvec_4_ptr(TCGv_ptr, TCGv_ptr, TCGv_ptr, + TCGv_ptr, TCGv_ptr, TCGv_i32); +void tcg_gen_gvec_4_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, TCGv_ptr ptr, uint32_t oprsz, + uint32_t maxsz, int32_t data, + gen_helper_gvec_4_ptr *fn); + +/* Expand a gvec operation. Either inline or out-of-line depending on + the actual vector size and the operations supported by the host. */ +typedef struct { + /* Expand inline as a 64-bit or 32-bit integer. + Only one of these will be non-NULL. */ + void (*fni8)(TCGv_i64, TCGv_i64); + void (*fni4)(TCGv_i32, TCGv_i32); + /* Expand inline with a host vector type. */ + void (*fniv)(unsigned, TCGv_vec, TCGv_vec); + /* Expand out-of-line helper w/descriptor. */ + gen_helper_gvec_2 *fno; + /* The opcode, if any, to which this corresponds. */ + TCGOpcode opc; + /* The vector element size, if applicable. */ + uint8_t vece; + /* Prefer i64 to v64. */ + bool prefer_i64; +} GVecGen2; + +typedef struct { + /* Expand inline as a 64-bit or 32-bit integer. + Only one of these will be non-NULL. */ + void (*fni8)(TCGv_i64, TCGv_i64, TCGv_i64); + void (*fni4)(TCGv_i32, TCGv_i32, TCGv_i32); + /* Expand inline with a host vector type. */ + void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec); + /* Expand out-of-line helper w/descriptor. */ + gen_helper_gvec_3 *fno; + /* The opcode, if any, to which this corresponds. */ + TCGOpcode opc; + /* The vector element size, if applicable. */ + uint8_t vece; + /* Prefer i64 to v64. */ + bool prefer_i64; + /* Load dest as a 3rd source operand. */ + bool load_dest; +} GVecGen3; + +typedef struct { + /* Expand inline as a 64-bit or 32-bit integer. + Only one of these will be non-NULL. */ + void (*fni8)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64); + void (*fni4)(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32); + /* Expand inline with a host vector type. */ + void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec, TCGv_vec); + /* Expand out-of-line helper w/descriptor. */ + gen_helper_gvec_4 *fno; + /* The opcode, if any, to which this corresponds. */ + TCGOpcode opc; + /* The data argument to the out-of-line helper. */ + uint32_t data; + /* The vector element size, if applicable. */ + uint8_t vece; + /* Prefer i64 to v64. */ + bool prefer_i64; +} GVecGen4; + +void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, const GVecGen2 *); +void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t oprsz, uint32_t maxsz, const GVecGen3 *); +void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs, + uint32_t oprsz, uint32_t maxsz, const GVecGen4 *); + +/* Expand a specific vector operation. */ + +void tcg_gen_gvec_mov(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_not(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_neg(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz); + +void tcg_gen_gvec_add(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_sub(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz); + +void tcg_gen_gvec_and(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_or(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_xor(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_andc(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz); + +void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t s, uint32_t m); +void tcg_gen_gvec_dup_i32(unsigned vece, uint32_t dofs, uint32_t s, + uint32_t m, TCGv_i32); +void tcg_gen_gvec_dup_i64(unsigned vece, uint32_t dofs, uint32_t s, + uint32_t m, TCGv_i64); + +void tcg_gen_gvec_dup8i(uint32_t dofs, uint32_t s, uint32_t m, uint8_t x); +void tcg_gen_gvec_dup16i(uint32_t dofs, uint32_t s, uint32_t m, uint16_t x); +void tcg_gen_gvec_dup32i(uint32_t dofs, uint32_t s, uint32_t m, uint32_t x); +void tcg_gen_gvec_dup64i(uint32_t dofs, uint32_t s, uint32_t m, uint64_t x); + +/* + * 64-bit vector operations. Use these when the register has been allocated + * with tcg_global_mem_new_i64, and so we cannot also address it via pointer. + * OPRSZ = MAXSZ = 8. + */ + +void tcg_gen_vec_neg8_i64(TCGv_i64 d, TCGv_i64 a); +void tcg_gen_vec_neg16_i64(TCGv_i64 d, TCGv_i64 a); +void tcg_gen_vec_neg32_i64(TCGv_i64 d, TCGv_i64 a); + +void tcg_gen_vec_add8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); +void tcg_gen_vec_add16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); +void tcg_gen_vec_add32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); + +void tcg_gen_vec_sub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); +void tcg_gen_vec_sub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); +void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index df2eabaa67..09fec5693f 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -914,6 +914,7 @@ void tcg_gen_dup8i_vec(TCGv_vec, uint32_t); void tcg_gen_dup16i_vec(TCGv_vec, uint32_t); void tcg_gen_dup32i_vec(TCGv_vec, uint32_t); void tcg_gen_dup64i_vec(TCGv_vec, uint64_t); +void tcg_gen_dupi_vec(unsigned vece, TCGv_vec, uint64_t); void tcg_gen_movi_v64(TCGv_vec, uint64_t); void tcg_gen_movi_v128(TCGv_vec, uint64_t, uint64_t); void tcg_gen_movi_v256(TCGv_vec, uint64_t, uint64_t, uint64_t, uint64_t); diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index 4e62eda14b..b4e16cfbc3 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -229,6 +229,12 @@ DEF(andc_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_andc_vec)) DEF(orc_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_orc_vec)) DEF(not_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_not_vec)) +DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT) + +#if TCG_TARGET_MAYBE_vec +#include "tcg-target.opc.h" +#endif + #undef TLADDR_ARGS #undef DATA64_ARGS #undef IMPL diff --git a/tcg/tcg.h b/tcg/tcg.h index dce483b0ee..0e9d79bdd4 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -1207,6 +1207,21 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr); void tcg_register_jit(void *buf, size_t buf_size); +#if TCG_TARGET_MAYBE_vec +/* Return zero if the tuple (opc, type, vece) is unsupportable; + return > 0 if it is directly supportable; + return < 0 if we must call tcg_expand_vec_op. */ +int tcg_can_emit_vec_op(TCGOpcode, TCGType, unsigned); +#else +static inline int tcg_can_emit_vec_op(TCGOpcode o, TCGType t, unsigned ve) +{ + return 0; +} +#endif + +/* Expand the tuple (opc, type, vece) on the given arguments. */ +void tcg_expand_vec_op(TCGOpcode, TCGType, unsigned, TCGArg, ...); + /* * Memory helpers that will be used by TCG generated code. */ diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c new file mode 100644 index 0000000000..cd1ce12b7e --- /dev/null +++ b/accel/tcg/tcg-runtime-gvec.c @@ -0,0 +1,295 @@ +/* + * Generic vectorized operation runtime + * + * Copyright (c) 2017 Linaro + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see . + */ + +#include "qemu/osdep.h" +#include "qemu/host-utils.h" +#include "cpu.h" +#include "exec/helper-proto.h" +#include "tcg-gvec-desc.h" + + +/* Virtually all hosts support 16-byte vectors. Those that don't can emulate + them via GCC's generic vector extension. This turns out to be simpler and + more reliable than getting the compiler to autovectorize. + + In tcg-op-gvec.c, we asserted that both the size and alignment + of the data are multiples of 16. */ + +typedef uint8_t vec8 __attribute__((vector_size(16))); +typedef uint16_t vec16 __attribute__((vector_size(16))); +typedef uint32_t vec32 __attribute__((vector_size(16))); +typedef uint64_t vec64 __attribute__((vector_size(16))); + +static inline void clear_high(void *d, intptr_t oprsz, uint32_t desc) +{ + intptr_t maxsz = simd_maxsz(desc); + intptr_t i; + + if (unlikely(maxsz > oprsz)) { + for (i = oprsz; i < maxsz; i += sizeof(uint64_t)) { + *(uint64_t *)(d + i) = 0; + } + } +} + +void HELPER(gvec_add8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(vec8 *)(d + i) = *(vec8 *)(a + i) + *(vec8 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_add16)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(vec16 *)(d + i) = *(vec16 *)(a + i) + *(vec16 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_add32)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(vec32 *)(d + i) = *(vec32 *)(a + i) + *(vec32 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_add64)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) + *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sub8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(vec8 *)(d + i) = *(vec8 *)(a + i) - *(vec8 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sub16)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(vec16 *)(d + i) = *(vec16 *)(a + i) - *(vec16 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sub32)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(vec32 *)(d + i) = *(vec32 *)(a + i) - *(vec32 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sub64)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) - *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_neg8)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(vec8 *)(d + i) = -*(vec8 *)(a + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_neg16)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(vec16 *)(d + i) = -*(vec16 *)(a + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_neg32)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(vec32 *)(d + i) = -*(vec32 *)(a + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_neg64)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = -*(vec64 *)(a + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_mov)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + + memcpy(d, a, oprsz); + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_dup64)(void *d, uint32_t desc, uint64_t c) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + if (c == 0) { + oprsz = 0; + } else { + for (i = 0; i < oprsz; i += sizeof(uint64_t)) { + *(uint64_t *)(d + i) = c; + } + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_dup32)(void *d, uint32_t desc, uint32_t c) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + if (c == 0) { + oprsz = 0; + } else { + for (i = 0; i < oprsz; i += sizeof(uint32_t)) { + *(uint32_t *)(d + i) = c; + } + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_dup16)(void *d, uint32_t desc, uint32_t c) +{ + HELPER(gvec_dup32)(d, desc, 0x00010001 * (c & 0xffff)); +} + +void HELPER(gvec_dup8)(void *d, uint32_t desc, uint32_t c) +{ + HELPER(gvec_dup32)(d, desc, 0x01010101 * (c & 0xff)); +} + +void HELPER(gvec_not)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = ~*(vec64 *)(a + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_and)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) & *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_or)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) | *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_xor)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) ^ *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_andc)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) &~ *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_orc)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) |~ *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c new file mode 100644 index 0000000000..2b7f4c8544 --- /dev/null +++ b/tcg/tcg-op-gvec.c @@ -0,0 +1,1403 @@ +/* + * Generic vector operation expansion + * + * Copyright (c) 2017 Linaro + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see . + */ + +#include "qemu/osdep.h" +#include "qemu-common.h" +#include "tcg.h" +#include "tcg-op.h" +#include "tcg-op-gvec.h" +#include "tcg-gvec-desc.h" + +#define REP8(x) ((x) * 0x0101010101010101ull) +#define REP16(x) ((x) * 0x0001000100010001ull) + +#define MAX_UNROLL 4 + +/* Verify vector size and alignment rules. OFS should be the OR of all + of the operand offsets so that we can check them all at once. */ +static void check_size_align(uint32_t oprsz, uint32_t maxsz, uint32_t ofs) +{ + uint32_t align = maxsz > 16 || oprsz >= 16 ? 15 : 7; + tcg_debug_assert(oprsz > 0); + tcg_debug_assert(oprsz <= maxsz); + tcg_debug_assert((oprsz & align) == 0); + tcg_debug_assert((maxsz & align) == 0); + tcg_debug_assert((ofs & align) == 0); +} + +/* Verify vector overlap rules for two operands. */ +static void check_overlap_2(uint32_t d, uint32_t a, uint32_t s) +{ + tcg_debug_assert(d == a || d + s <= a || a + s <= d); +} + +/* Verify vector overlap rules for three operands. */ +static void check_overlap_3(uint32_t d, uint32_t a, uint32_t b, uint32_t s) +{ + check_overlap_2(d, a, s); + check_overlap_2(d, b, s); + check_overlap_2(a, b, s); +} + +/* Verify vector overlap rules for four operands. */ +static void check_overlap_4(uint32_t d, uint32_t a, uint32_t b, + uint32_t c, uint32_t s) +{ + check_overlap_2(d, a, s); + check_overlap_2(d, b, s); + check_overlap_2(d, c, s); + check_overlap_2(a, b, s); + check_overlap_2(a, c, s); + check_overlap_2(b, c, s); +} + +/* Create a descriptor from components. */ +uint32_t simd_desc(uint32_t oprsz, uint32_t maxsz, int32_t data) +{ + uint32_t desc = 0; + + assert(oprsz % 8 == 0 && oprsz <= (8 << SIMD_OPRSZ_BITS)); + assert(maxsz % 8 == 0 && maxsz <= (8 << SIMD_MAXSZ_BITS)); + assert(data == sextract32(data, 0, SIMD_DATA_BITS)); + + oprsz = (oprsz / 8) - 1; + maxsz = (maxsz / 8) - 1; + desc = deposit32(desc, SIMD_OPRSZ_SHIFT, SIMD_OPRSZ_BITS, oprsz); + desc = deposit32(desc, SIMD_MAXSZ_SHIFT, SIMD_MAXSZ_BITS, maxsz); + desc = deposit32(desc, SIMD_DATA_SHIFT, SIMD_DATA_BITS, data); + + return desc; +} + +/* Generate a call to a gvec-style helper with two vector operands. */ +void tcg_gen_gvec_2_ool(uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, int32_t data, + gen_helper_gvec_2 *fn) +{ + TCGv_ptr a0, a1; + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 = tcg_temp_new_ptr(); + a1 = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + tcg_gen_addi_ptr(a1, cpu_env, aofs); + + fn(a0, a1, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_i32(desc); +} + +/* Generate a call to a gvec-style helper with three vector operands. */ +void tcg_gen_gvec_3_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t oprsz, uint32_t maxsz, int32_t data, + gen_helper_gvec_3 *fn) +{ + TCGv_ptr a0, a1, a2; + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 = tcg_temp_new_ptr(); + a1 = tcg_temp_new_ptr(); + a2 = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + tcg_gen_addi_ptr(a1, cpu_env, aofs); + tcg_gen_addi_ptr(a2, cpu_env, bofs); + + fn(a0, a1, a2, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_ptr(a2); + tcg_temp_free_i32(desc); +} + +/* Generate a call to a gvec-style helper with four vector operands. */ +void tcg_gen_gvec_4_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, uint32_t oprsz, uint32_t maxsz, + int32_t data, gen_helper_gvec_4 *fn) +{ + TCGv_ptr a0, a1, a2, a3; + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 = tcg_temp_new_ptr(); + a1 = tcg_temp_new_ptr(); + a2 = tcg_temp_new_ptr(); + a3 = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + tcg_gen_addi_ptr(a1, cpu_env, aofs); + tcg_gen_addi_ptr(a2, cpu_env, bofs); + tcg_gen_addi_ptr(a3, cpu_env, cofs); + + fn(a0, a1, a2, a3, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_ptr(a2); + tcg_temp_free_ptr(a3); + tcg_temp_free_i32(desc); +} + +/* Generate a call to a gvec-style helper with five vector operands. */ +void tcg_gen_gvec_5_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, uint32_t xofs, uint32_t oprsz, + uint32_t maxsz, int32_t data, gen_helper_gvec_5 *fn) +{ + TCGv_ptr a0, a1, a2, a3, a4; + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 = tcg_temp_new_ptr(); + a1 = tcg_temp_new_ptr(); + a2 = tcg_temp_new_ptr(); + a3 = tcg_temp_new_ptr(); + a4 = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + tcg_gen_addi_ptr(a1, cpu_env, aofs); + tcg_gen_addi_ptr(a2, cpu_env, bofs); + tcg_gen_addi_ptr(a3, cpu_env, cofs); + tcg_gen_addi_ptr(a4, cpu_env, xofs); + + fn(a0, a1, a2, a3, a4, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_ptr(a2); + tcg_temp_free_ptr(a3); + tcg_temp_free_ptr(a4); + tcg_temp_free_i32(desc); +} + +/* Generate a call to a gvec-style helper with three vector operands + and an extra pointer operand. */ +void tcg_gen_gvec_2_ptr(uint32_t dofs, uint32_t aofs, + TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz, + int32_t data, gen_helper_gvec_2_ptr *fn) +{ + TCGv_ptr a0, a1; + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 = tcg_temp_new_ptr(); + a1 = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + tcg_gen_addi_ptr(a1, cpu_env, aofs); + + fn(a0, a1, ptr, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_i32(desc); +} + +/* Generate a call to a gvec-style helper with three vector operands + and an extra pointer operand. */ +void tcg_gen_gvec_3_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs, + TCGv_ptr ptr, uint32_t oprsz, uint32_t maxsz, + int32_t data, gen_helper_gvec_3_ptr *fn) +{ + TCGv_ptr a0, a1, a2; + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 = tcg_temp_new_ptr(); + a1 = tcg_temp_new_ptr(); + a2 = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + tcg_gen_addi_ptr(a1, cpu_env, aofs); + tcg_gen_addi_ptr(a2, cpu_env, bofs); + + fn(a0, a1, a2, ptr, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_ptr(a2); + tcg_temp_free_i32(desc); +} + +/* Generate a call to a gvec-style helper with four vector operands + and an extra pointer operand. */ +void tcg_gen_gvec_4_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, TCGv_ptr ptr, uint32_t oprsz, + uint32_t maxsz, int32_t data, + gen_helper_gvec_4_ptr *fn) +{ + TCGv_ptr a0, a1, a2, a3; + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 = tcg_temp_new_ptr(); + a1 = tcg_temp_new_ptr(); + a2 = tcg_temp_new_ptr(); + a3 = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + tcg_gen_addi_ptr(a1, cpu_env, aofs); + tcg_gen_addi_ptr(a2, cpu_env, bofs); + tcg_gen_addi_ptr(a3, cpu_env, cofs); + + fn(a0, a1, a2, a3, ptr, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_ptr(a2); + tcg_temp_free_ptr(a3); + tcg_temp_free_i32(desc); +} + +/* Return true if we want to implement something of OPRSZ bytes + in units of LNSZ. This limits the expansion of inline code. */ +static inline bool check_size_impl(uint32_t oprsz, uint32_t lnsz) +{ + uint32_t lnct = oprsz / lnsz; + return lnct >= 1 && lnct <= MAX_UNROLL; +} + +static void expand_clr(uint32_t dofs, uint32_t maxsz); + +/* Set OPRSZ bytes at DOFS to replications of IN or IN_C. */ +static void do_dup_i32(unsigned vece, uint32_t dofs, uint32_t oprsz, + uint32_t maxsz, TCGv_i32 in, uint32_t in_c, + void (*ool)(TCGv_ptr, TCGv_i32, TCGv_i32)) +{ + TCGType type; + TCGv_vec t_vec; + uint32_t i; + + assert(vece <= MO_32); + + if (TCG_TARGET_HAS_v256 && check_size_impl(oprsz, 32)) { + type = TCG_TYPE_V256; + } else if (TCG_TARGET_HAS_v128 && check_size_impl(oprsz, 16)) { + type = TCG_TYPE_V128; + } else if (TCG_TARGET_HAS_v64 && check_size_impl(oprsz, 8)) { + type = TCG_TYPE_V64; + } else { + if (check_size_impl(oprsz, 4)) { + TCGv_i32 t_i32 = tcg_temp_new_i32(); + + if (in) { + switch (vece) { + case MO_8: + tcg_gen_deposit_i32(t_i32, in, in, 8, 24); + in = t_i32; + /* fallthru */ + case MO_16: + tcg_gen_deposit_i32(t_i32, in, in, 16, 16); + break; + default: + tcg_gen_mov_i32(t_i32, in); + break; + } + } else { + switch (vece) { + case MO_8: + in_c = (in_c & 0xff) * 0x01010101; + break; + case MO_16: + in_c = deposit32(in_c, 16, 16, in_c); + break; + } + tcg_gen_movi_i32(t_i32, in_c); + } + + for (i = 0; i < oprsz; i += 4) { + tcg_gen_st_i32(t_i32, cpu_env, dofs + i); + } + tcg_temp_free_i32(t_i32); + goto done; + } else { + TCGv_i32 t_i32 = in ? in : tcg_const_i32(in_c); + TCGv_ptr a0 = tcg_temp_new_ptr(); + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, 0)); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + ool(a0, desc, t_i32); + + tcg_temp_free_ptr(a0); + tcg_temp_free_i32(desc); + if (in == NULL) { + tcg_temp_free_i32(t_i32); + } + return; + } + } + + t_vec = tcg_temp_new_vec(type); + if (in) { + tcg_gen_dup_i32_vec(vece, t_vec, in); + } else { + switch (vece) { + case MO_8: + tcg_gen_dup8i_vec(t_vec, in_c); + break; + case MO_16: + tcg_gen_dup16i_vec(t_vec, in_c); + break; + default: + tcg_gen_dup32i_vec(t_vec, in_c); + break; + } + } + + i = 0; + if (TCG_TARGET_HAS_v256) { + for (; i + 32 <= oprsz; i += 32) { + tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V256); + } + } + if (TCG_TARGET_HAS_v128) { + for (; i + 16 <= oprsz; i += 16) { + tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V128); + } + } + if (TCG_TARGET_HAS_v64) { + for (; i < oprsz; i += 8) { + tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V64); + } + } + tcg_temp_free_vec(t_vec); + + done: + tcg_debug_assert(i == oprsz); + if (i < maxsz) { + expand_clr(dofs + i, maxsz - i); + } +} + +/* Likewise, but with 64-bit quantities. */ +static void do_dup_i64(unsigned vece, uint32_t dofs, uint32_t oprsz, + uint32_t maxsz, TCGv_i64 in, uint64_t in_c) +{ + TCGType type; + TCGv_vec t_vec; + uint32_t i; + + assert(vece <= MO_64); + + if (TCG_TARGET_HAS_v256 && check_size_impl(oprsz, 32)) { + type = TCG_TYPE_V256; + } else if (TCG_TARGET_HAS_v128 && check_size_impl(oprsz, 16)) { + type = TCG_TYPE_V128; + } else if (TCG_TARGET_HAS_v64 && TCG_TARGET_REG_BITS == 32 + && check_size_impl(oprsz, 8)) { + type = TCG_TYPE_V64; + } else { + if (check_size_impl(oprsz, 8)) { + TCGv_i64 t_i64 = tcg_temp_new_i64(); + + if (in) { + switch (vece) { + case MO_8: + tcg_gen_deposit_i64(t_i64, in, in, 8, 56); + in = t_i64; + /* fallthru */ + case MO_16: + tcg_gen_deposit_i64(t_i64, in, in, 16, 48); + in = t_i64; + /* fallthru */ + case MO_32: + tcg_gen_deposit_i64(t_i64, in, in, 32, 32); + break; + default: + tcg_gen_mov_i64(t_i64, in); + break; + } + } else { + switch (vece) { + case MO_8: + in_c = (in_c & 0xff) * 0x0101010101010101ull; + break; + case MO_16: + in_c = (in_c & 0xffff) * 0x0001000100010001ull; + break; + case MO_32: + in_c = deposit64(in_c, 32, 32, in_c); + break; + } + tcg_gen_movi_i64(t_i64, in_c); + } + + for (i = 0; i < oprsz; i += 8) { + tcg_gen_st_i64(t_i64, cpu_env, dofs + i); + } + tcg_temp_free_i64(t_i64); + goto done; + } else { + TCGv_i64 t_i64 = in ? in : tcg_const_i64(in_c); + TCGv_ptr a0 = tcg_temp_new_ptr(); + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, 0)); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + gen_helper_gvec_dup64(a0, desc, t_i64); + + tcg_temp_free_ptr(a0); + tcg_temp_free_i32(desc); + if (in == NULL) { + tcg_temp_free_i64(t_i64); + } + return; + } + } + + t_vec = tcg_temp_new_vec(type); + if (in) { + tcg_gen_dup_i64_vec(vece, t_vec, in); + } else { + switch (vece) { + case MO_8: + tcg_gen_dup8i_vec(t_vec, in_c); + break; + case MO_16: + tcg_gen_dup16i_vec(t_vec, in_c); + break; + case MO_32: + tcg_gen_dup32i_vec(t_vec, in_c); + break; + default: + tcg_gen_dup64i_vec(t_vec, in_c); + break; + } + } + + i = 0; + if (TCG_TARGET_HAS_v256) { + for (; i + 32 <= oprsz; i += 32) { + tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V256); + } + } + if (TCG_TARGET_HAS_v128) { + for (; i + 16 <= oprsz; i += 16) { + tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V128); + } + } + if (TCG_TARGET_HAS_v64) { + for (; i < oprsz; i += 8) { + tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V64); + } + } + tcg_temp_free_vec(t_vec); + + done: + tcg_debug_assert(i == oprsz); + if (i < maxsz) { + expand_clr(dofs + i, maxsz - i); + } +} + +/* Likewise, but with zero. */ +static void expand_clr(uint32_t dofs, uint32_t maxsz) +{ + if (TCG_TARGET_REG_BITS == 64) { + do_dup_i64(MO_64, dofs, maxsz, maxsz, NULL, 0); + } else { + do_dup_i32(MO_32, dofs, maxsz, maxsz, NULL, 0, gen_helper_gvec_dup32); + } +} + +/* Expand OPSZ bytes worth of two-operand operations using i32 elements. */ +static void expand_2_i32(uint32_t dofs, uint32_t aofs, uint32_t oprsz, + void (*fni)(TCGv_i32, TCGv_i32)) +{ + TCGv_i32 t0 = tcg_temp_new_i32(); + uint32_t i; + + for (i = 0; i < oprsz; i += 4) { + tcg_gen_ld_i32(t0, cpu_env, aofs + i); + fni(t0, t0); + tcg_gen_st_i32(t0, cpu_env, dofs + i); + } + tcg_temp_free_i32(t0); +} + +/* Expand OPSZ bytes worth of three-operand operations using i32 elements. */ +static void expand_3_i32(uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, bool load_dest, + void (*fni)(TCGv_i32, TCGv_i32, TCGv_i32)) +{ + TCGv_i32 t0 = tcg_temp_new_i32(); + TCGv_i32 t1 = tcg_temp_new_i32(); + TCGv_i32 t2 = tcg_temp_new_i32(); + uint32_t i; + + for (i = 0; i < oprsz; i += 4) { + tcg_gen_ld_i32(t0, cpu_env, aofs + i); + tcg_gen_ld_i32(t1, cpu_env, bofs + i); + if (load_dest) { + tcg_gen_ld_i32(t2, cpu_env, dofs + i); + } + fni(t2, t0, t1); + tcg_gen_st_i32(t2, cpu_env, dofs + i); + } + tcg_temp_free_i32(t2); + tcg_temp_free_i32(t1); + tcg_temp_free_i32(t0); +} + +/* Expand OPSZ bytes worth of three-operand operations using i32 elements. */ +static void expand_4_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, uint32_t opsz, + void (*fni)(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32)) +{ + TCGv_i32 t0 = tcg_temp_new_i32(); + TCGv_i32 t1 = tcg_temp_new_i32(); + TCGv_i32 t2 = tcg_temp_new_i32(); + TCGv_i32 t3 = tcg_temp_new_i32(); + uint32_t i; + + for (i = 0; i < opsz; i += 4) { + tcg_gen_ld_i32(t1, cpu_env, aofs + i); + tcg_gen_ld_i32(t2, cpu_env, bofs + i); + tcg_gen_ld_i32(t3, cpu_env, cofs + i); + fni(t0, t1, t2, t3); + tcg_gen_st_i32(t0, cpu_env, dofs + i); + } + tcg_temp_free_i32(t3); + tcg_temp_free_i32(t2); + tcg_temp_free_i32(t1); + tcg_temp_free_i32(t0); +} + +/* Expand OPSZ bytes worth of two-operand operations using i64 elements. */ +static void expand_2_i64(uint32_t dofs, uint32_t aofs, uint32_t oprsz, + void (*fni)(TCGv_i64, TCGv_i64)) +{ + TCGv_i64 t0 = tcg_temp_new_i64(); + uint32_t i; + + for (i = 0; i < oprsz; i += 8) { + tcg_gen_ld_i64(t0, cpu_env, aofs + i); + fni(t0, t0); + tcg_gen_st_i64(t0, cpu_env, dofs + i); + } + tcg_temp_free_i64(t0); +} + +/* Expand OPSZ bytes worth of three-operand operations using i64 elements. */ +static void expand_3_i64(uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, bool load_dest, + void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64)) +{ + TCGv_i64 t0 = tcg_temp_new_i64(); + TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 t2 = tcg_temp_new_i64(); + uint32_t i; + + for (i = 0; i < oprsz; i += 8) { + tcg_gen_ld_i64(t0, cpu_env, aofs + i); + tcg_gen_ld_i64(t1, cpu_env, bofs + i); + if (load_dest) { + tcg_gen_ld_i64(t2, cpu_env, dofs + i); + } + fni(t2, t0, t1); + tcg_gen_st_i64(t2, cpu_env, dofs + i); + } + tcg_temp_free_i64(t2); + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t0); +} + +/* Expand OPSZ bytes worth of three-operand operations using i64 elements. */ +static void expand_4_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t cofs, uint32_t opsz, + void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64)) +{ + TCGv_i64 t0 = tcg_temp_new_i64(); + TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 t2 = tcg_temp_new_i64(); + TCGv_i64 t3 = tcg_temp_new_i64(); + uint32_t i; + + for (i = 0; i < opsz; i += 8) { + tcg_gen_ld_i64(t1, cpu_env, aofs + i); + tcg_gen_ld_i64(t2, cpu_env, bofs + i); + tcg_gen_ld_i64(t3, cpu_env, cofs + i); + fni(t0, t1, t2, t3); + tcg_gen_st_i64(t0, cpu_env, dofs + i); + } + tcg_temp_free_i64(t3); + tcg_temp_free_i64(t2); + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t0); +} + +/* Expand OPSZ bytes worth of two-operand operations using host vectors. */ +static void expand_2_vec(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t tysz, TCGType type, + void (*fni)(unsigned, TCGv_vec, TCGv_vec)) +{ + TCGv_vec t0 = tcg_temp_new_vec(type); + uint32_t i; + + for (i = 0; i < oprsz; i += tysz) { + tcg_gen_ld_vec(t0, cpu_env, aofs + i); + fni(vece, t0, t0); + tcg_gen_st_vec(t0, cpu_env, dofs + i); + } + tcg_temp_free_vec(t0); +} + +/* Expand OPSZ bytes worth of three-operand operations using host vectors. */ +static void expand_3_vec(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, + uint32_t tysz, TCGType type, bool load_dest, + void (*fni)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec)) +{ + TCGv_vec t0 = tcg_temp_new_vec(type); + TCGv_vec t1 = tcg_temp_new_vec(type); + TCGv_vec t2 = tcg_temp_new_vec(type); + uint32_t i; + + for (i = 0; i < oprsz; i += tysz) { + tcg_gen_ld_vec(t0, cpu_env, aofs + i); + tcg_gen_ld_vec(t1, cpu_env, bofs + i); + if (load_dest) { + tcg_gen_ld_vec(t2, cpu_env, dofs + i); + } + fni(vece, t2, t0, t1); + tcg_gen_st_vec(t2, cpu_env, dofs + i); + } + tcg_temp_free_vec(t2); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t0); +} + +/* Expand OPSZ bytes worth of four-operand operations using host vectors. */ +static void expand_4_vec(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t cofs, uint32_t opsz, + uint32_t tysz, TCGType type, + void (*fni)(unsigned, TCGv_vec, TCGv_vec, + TCGv_vec, TCGv_vec)) +{ + TCGv_vec t0 = tcg_temp_new_vec(type); + TCGv_vec t1 = tcg_temp_new_vec(type); + TCGv_vec t2 = tcg_temp_new_vec(type); + TCGv_vec t3 = tcg_temp_new_vec(type); + uint32_t i; + + for (i = 0; i < opsz; i += tysz) { + tcg_gen_ld_vec(t1, cpu_env, aofs + i); + tcg_gen_ld_vec(t2, cpu_env, bofs + i); + tcg_gen_ld_vec(t3, cpu_env, cofs + i); + fni(vece, t0, t1, t2, t3); + tcg_gen_st_vec(t0, cpu_env, dofs + i); + } + tcg_temp_free_vec(t3); + tcg_temp_free_vec(t2); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t0); +} + +/* Expand a vector two-operand operation. */ +void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, const GVecGen2 *g) +{ + check_size_align(oprsz, maxsz, dofs | aofs); + check_overlap_2(dofs, aofs, maxsz); + + /* Quick check for sizes we won't support inline. */ + if (oprsz > MAX_UNROLL * 32 || maxsz > MAX_UNROLL * 32) { + goto do_ool; + } + + /* Recall that ARM SVE allows vector sizes that are not a power of 2. + Expand with successively smaller host vector sizes. The intent is + that e.g. oprsz == 80 would be expanded with 2x32 + 1x16. */ + /* ??? For maxsz > oprsz, the host may be able to use an op-sized + operation, zeroing the balance of the register. We can then + use a cl-sized store to implement the clearing without an extra + store operation. This is true for aarch64 and x86_64 hosts. */ + + if (TCG_TARGET_HAS_v256 && g->fniv && check_size_impl(oprsz, 32) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V256, g->vece))) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 32); + expand_2_vec(g->vece, dofs, aofs, done, 32, TCG_TYPE_V256, g->fniv); + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (TCG_TARGET_HAS_v128 && g->fniv && check_size_impl(oprsz, 16) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V128, g->vece))) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 16); + expand_2_vec(g->vece, dofs, aofs, done, 16, TCG_TYPE_V128, g->fniv); + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (check_size_impl(oprsz, 8)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 8); + if (TCG_TARGET_HAS_v64 && g->fniv && !g->prefer_i64 + && (!g->opc + || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V64, g->vece))) { + expand_2_vec(g->vece, dofs, aofs, done, 8, TCG_TYPE_V64, g->fniv); + } else if (g->fni8) { + expand_2_i64(dofs, aofs, done, g->fni8); + } else { + done = 0; + } + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (g->fni4 && check_size_impl(oprsz, 4)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 4); + expand_2_i32(dofs, aofs, done, g->fni4); + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (oprsz == 0) { + if (maxsz != 0) { + expand_clr(dofs, maxsz); + } + return; + } + + do_ool: + tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, 0, g->fno); +} + +/* Expand a vector three-operand operation. */ +void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t oprsz, uint32_t maxsz, const GVecGen3 *g) +{ + check_size_align(oprsz, maxsz, dofs | aofs | bofs); + check_overlap_3(dofs, aofs, bofs, maxsz); + + /* Quick check for sizes we won't support inline. */ + if (oprsz > MAX_UNROLL * 32 || maxsz > MAX_UNROLL * 32) { + goto do_ool; + } + + /* Recall that ARM SVE allows vector sizes that are not a power of 2. + Expand with successively smaller host vector sizes. The intent is + that e.g. oprsz == 80 would be expanded with 2x32 + 1x16. */ + /* ??? For maxsz > oprsz, the host may be able to use an op-sized + operation, zeroing the balance of the register. We can then + use a cl-sized store to implement the clearing without an extra + store operation. This is true for aarch64 and x86_64 hosts. */ + + if (TCG_TARGET_HAS_v256 && g->fniv && check_size_impl(oprsz, 32) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V256, g->vece))) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 32); + expand_3_vec(g->vece, dofs, aofs, bofs, done, 32, TCG_TYPE_V256, + g->load_dest, g->fniv); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (TCG_TARGET_HAS_v128 && g->fniv && check_size_impl(oprsz, 16) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V128, g->vece))) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 16); + expand_3_vec(g->vece, dofs, aofs, bofs, done, 16, TCG_TYPE_V128, + g->load_dest, g->fniv); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (check_size_impl(oprsz, 8)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 8); + if (TCG_TARGET_HAS_v64 && g->fniv && !g->prefer_i64 + && (!g->opc + || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V64, g->vece))) { + expand_3_vec(g->vece, dofs, aofs, bofs, done, 8, TCG_TYPE_V64, + g->load_dest, g->fniv); + } else if (g->fni8) { + expand_3_i64(dofs, aofs, bofs, done, g->load_dest, g->fni8); + } else { + done = 0; + } + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (g->fni4 && check_size_impl(oprsz, 4)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 4); + expand_3_i32(dofs, aofs, bofs, done, g->load_dest, g->fni4); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (oprsz == 0) { + if (maxsz != 0) { + expand_clr(dofs, maxsz); + } + return; + } + + do_ool: + tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, 0, g->fno); +} + +/* Expand a vector four-operand operation. */ +void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs, + uint32_t oprsz, uint32_t maxsz, const GVecGen4 *g) +{ + check_size_align(oprsz, maxsz, dofs | aofs | bofs | cofs); + check_overlap_4(dofs, aofs, bofs, cofs, maxsz); + + /* Quick check for sizes we won't support inline. */ + if (oprsz > MAX_UNROLL * 32 || maxsz > MAX_UNROLL * 32) { + goto do_ool; + } + + /* Recall that ARM SVE allows vector sizes that are not a power of 2. + Expand with successively smaller host vector sizes. The intent is + that e.g. oprsz == 80 would be expanded with 2x32 + 1x16. */ + /* ??? For maxsz > oprsz, the host may be able to use an op-sized + operation, zeroing the balance of the register. We can then + use a cl-sized store to implement the clearing without an extra + store operation. This is true for aarch64 and x86_64 hosts. */ + + if (TCG_TARGET_HAS_v256 && g->fniv && check_size_impl(oprsz, 32) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V256, g->vece))) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 32); + expand_4_vec(g->vece, dofs, aofs, bofs, cofs, done, + 32, TCG_TYPE_V256, g->fniv); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (TCG_TARGET_HAS_v128 && g->fniv && check_size_impl(oprsz, 16) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V128, g->vece))) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 16); + expand_4_vec(g->vece, dofs, aofs, bofs, cofs, done, + 16, TCG_TYPE_V128, g->fniv); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (check_size_impl(oprsz, 8)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 8); + if (TCG_TARGET_HAS_v64 && g->fniv && !g->prefer_i64 + && (!g->opc + || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V64, g->vece))) { + expand_4_vec(g->vece, dofs, aofs, bofs, cofs, done, + 8, TCG_TYPE_V64, g->fniv); + } else if (g->fni8) { + expand_4_i64(dofs, aofs, bofs, cofs, done, g->fni8); + } else { + done = 0; + } + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (g->fni4 && check_size_impl(oprsz, 4)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 4); + expand_4_i32(dofs, aofs, bofs, cofs, done, g->fni4); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (oprsz == 0) { + if (maxsz != 0) { + expand_clr(dofs, maxsz); + } + return; + } + + do_ool: + tcg_gen_gvec_4_ool(dofs, aofs, bofs, cofs, oprsz, maxsz, g->data, g->fno); +} + +/* + * Expand specific vector operations. + */ + +static void vec_mov2(unsigned vece, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_mov_vec(a, b); +} + +void tcg_gen_gvec_mov(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz) +{ + static const GVecGen2 g = { + .fni8 = tcg_gen_mov_i64, + .fniv = vec_mov2, + .fno = gen_helper_gvec_mov, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + if (dofs != aofs) { + tcg_gen_gvec_2(dofs, aofs, oprsz, maxsz, &g); + } else { + check_size_align(oprsz, maxsz, dofs); + if (oprsz < maxsz) { + expand_clr(dofs + oprsz, maxsz - oprsz); + } + } +} + +void tcg_gen_gvec_dup_i32(unsigned vece, uint32_t dofs, uint32_t oprsz, + uint32_t maxsz, TCGv_i32 in) +{ + typedef void dup_fn(TCGv_ptr, TCGv_i32, TCGv_i32); + static dup_fn * const fns[3] = { + gen_helper_gvec_dup8, + gen_helper_gvec_dup16, + gen_helper_gvec_dup32 + }; + + check_size_align(oprsz, maxsz, dofs); + tcg_debug_assert(vece <= MO_32); + do_dup_i32(vece, dofs, oprsz, maxsz, in, 0, fns[vece]); +} + +void tcg_gen_gvec_dup_i64(unsigned vece, uint32_t dofs, uint32_t oprsz, + uint32_t maxsz, TCGv_i64 in) +{ + check_size_align(oprsz, maxsz, dofs); + tcg_debug_assert(vece <= MO_64); + if (vece <= MO_32) { + /* This works for both host register sizes. */ + tcg_gen_gvec_dup_i32(vece, dofs, oprsz, maxsz, (TCGv_i32)in); + } else { + do_dup_i64(vece, dofs, oprsz, maxsz, in, 0); + } +} + +void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz) +{ + if (vece <= MO_32) { + TCGv_i32 in = tcg_temp_new_i32(); + switch (vece) { + case MO_8: + tcg_gen_ld8u_i32(in, cpu_env, aofs); + break; + case MO_16: + tcg_gen_ld16u_i32(in, cpu_env, aofs); + break; + case MO_32: + tcg_gen_ld_i32(in, cpu_env, aofs); + break; + } + tcg_gen_gvec_dup_i32(vece, dofs, oprsz, maxsz, in); + tcg_temp_free_i32(in); + } else if (vece == MO_64) { + TCGv_i64 in = tcg_temp_new_i64(); + tcg_gen_ld_i64(in, cpu_env, aofs); + tcg_gen_gvec_dup_i64(MO_64, dofs, oprsz, maxsz, in); + tcg_temp_free_i64(in); + } else { + /* 128-bit duplicate. */ + /* ??? Dup to 256-bit vector. */ + int i; + + tcg_debug_assert(vece == 4); + tcg_debug_assert(oprsz >= 16); + if (TCG_TARGET_HAS_v128) { + TCGv_vec in = tcg_temp_new_vec(TCG_TYPE_V128); + + tcg_gen_ld_vec(in, cpu_env, aofs); + for (i = 0; i < oprsz; i += 16) { + tcg_gen_st_vec(in, cpu_env, dofs + i); + } + tcg_temp_free_vec(in); + } else { + TCGv_i64 in0 = tcg_temp_new_i64(); + TCGv_i64 in1 = tcg_temp_new_i64(); + + tcg_gen_ld_i64(in0, cpu_env, aofs); + tcg_gen_ld_i64(in1, cpu_env, aofs + 8); + for (i = 0; i < oprsz; i += 16) { + tcg_gen_st_i64(in0, cpu_env, dofs + i); + tcg_gen_st_i64(in1, cpu_env, dofs + i + 8); + } + tcg_temp_free_i64(in0); + tcg_temp_free_i64(in1); + } + } +} + +void tcg_gen_gvec_dup64i(uint32_t dofs, uint32_t oprsz, + uint32_t maxsz, uint64_t x) +{ + check_size_align(oprsz, maxsz, dofs); + do_dup_i64(MO_64, dofs, oprsz, maxsz, NULL, x); +} + +void tcg_gen_gvec_dup32i(uint32_t dofs, uint32_t oprsz, + uint32_t maxsz, uint32_t x) +{ + if (TCG_TARGET_REG_BITS == 64) { + do_dup_i64(MO_64, dofs, oprsz, maxsz, NULL, deposit64(x, 32, 32, x)); + } else { + do_dup_i32(MO_32, dofs, oprsz, maxsz, NULL, x, gen_helper_gvec_dup32); + } +} + +void tcg_gen_gvec_dup16i(uint32_t dofs, uint32_t oprsz, + uint32_t maxsz, uint16_t x) +{ + tcg_gen_gvec_dup32i(dofs, oprsz, maxsz, 0x00010001 * x); +} + +void tcg_gen_gvec_dup8i(uint32_t dofs, uint32_t oprsz, + uint32_t maxsz, uint8_t x) +{ + tcg_gen_gvec_dup32i(dofs, oprsz, maxsz, 0x01010101 * x); +} + +void tcg_gen_gvec_not(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz) +{ + static const GVecGen2 g = { + .fni8 = tcg_gen_not_i64, + .fniv = tcg_gen_not_vec, + .fno = gen_helper_gvec_not, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_2(dofs, aofs, oprsz, maxsz, &g); +} + +/* Perform a vector addition using normal addition and a mask. The mask + should be the sign bit of each lane. This 6-operation form is more + efficient than separate additions when there are 4 or more lanes in + the 64-bit operation. */ +static void gen_addv_mask(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, TCGv_i64 m) +{ + TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 t2 = tcg_temp_new_i64(); + TCGv_i64 t3 = tcg_temp_new_i64(); + + tcg_gen_andc_i64(t1, a, m); + tcg_gen_andc_i64(t2, b, m); + tcg_gen_xor_i64(t3, a, b); + tcg_gen_add_i64(d, t1, t2); + tcg_gen_and_i64(t3, t3, m); + tcg_gen_xor_i64(d, d, t3); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); + tcg_temp_free_i64(t3); +} + +void tcg_gen_vec_add8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 m = tcg_const_i64(REP8(0x80)); + gen_addv_mask(d, a, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec_add16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 m = tcg_const_i64(REP16(0x8000)); + gen_addv_mask(d, a, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec_add32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 t2 = tcg_temp_new_i64(); + + tcg_gen_andi_i64(t1, a, ~0xffffffffull); + tcg_gen_add_i64(t2, a, b); + tcg_gen_add_i64(t1, t1, b); + tcg_gen_deposit_i64(d, t1, t2, 0, 32); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); +} + +void tcg_gen_gvec_add(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz) +{ + static const GVecGen3 g[4] = { + { .fni8 = tcg_gen_vec_add8_i64, + .fniv = tcg_gen_add_vec, + .fno = gen_helper_gvec_add8, + .opc = INDEX_op_add_vec, + .vece = MO_8 }, + { .fni8 = tcg_gen_vec_add16_i64, + .fniv = tcg_gen_add_vec, + .fno = gen_helper_gvec_add16, + .opc = INDEX_op_add_vec, + .vece = MO_16 }, + { .fni4 = tcg_gen_add_i32, + .fniv = tcg_gen_add_vec, + .fno = gen_helper_gvec_add32, + .opc = INDEX_op_add_vec, + .vece = MO_32 }, + { .fni8 = tcg_gen_add_i64, + .fniv = tcg_gen_add_vec, + .fno = gen_helper_gvec_add64, + .opc = INDEX_op_add_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]); +} + +/* Perform a vector subtraction using normal subtraction and a mask. + Compare gen_addv_mask above. */ +static void gen_subv_mask(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, TCGv_i64 m) +{ + TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 t2 = tcg_temp_new_i64(); + TCGv_i64 t3 = tcg_temp_new_i64(); + + tcg_gen_or_i64(t1, a, m); + tcg_gen_andc_i64(t2, b, m); + tcg_gen_eqv_i64(t3, a, b); + tcg_gen_sub_i64(d, t1, t2); + tcg_gen_and_i64(t3, t3, m); + tcg_gen_xor_i64(d, d, t3); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); + tcg_temp_free_i64(t3); +} + +void tcg_gen_vec_sub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 m = tcg_const_i64(REP8(0x80)); + gen_subv_mask(d, a, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec_sub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 m = tcg_const_i64(REP16(0x8000)); + gen_subv_mask(d, a, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 t2 = tcg_temp_new_i64(); + + tcg_gen_andi_i64(t1, b, ~0xffffffffull); + tcg_gen_sub_i64(t2, a, b); + tcg_gen_sub_i64(t1, a, t1); + tcg_gen_deposit_i64(d, t1, t2, 0, 32); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); +} + +void tcg_gen_gvec_sub(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz) +{ + static const GVecGen3 g[4] = { + { .fni8 = tcg_gen_vec_sub8_i64, + .fniv = tcg_gen_sub_vec, + .fno = gen_helper_gvec_sub8, + .opc = INDEX_op_sub_vec, + .vece = MO_8 }, + { .fni8 = tcg_gen_vec_sub16_i64, + .fniv = tcg_gen_sub_vec, + .fno = gen_helper_gvec_sub16, + .opc = INDEX_op_sub_vec, + .vece = MO_16 }, + { .fni4 = tcg_gen_sub_i32, + .fniv = tcg_gen_sub_vec, + .fno = gen_helper_gvec_sub32, + .opc = INDEX_op_sub_vec, + .vece = MO_32 }, + { .fni8 = tcg_gen_sub_i64, + .fniv = tcg_gen_sub_vec, + .fno = gen_helper_gvec_sub64, + .opc = INDEX_op_sub_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]); +} + +/* Perform a vector negation using normal negation and a mask. + Compare gen_subv_mask above. */ +static void gen_negv_mask(TCGv_i64 d, TCGv_i64 b, TCGv_i64 m) +{ + TCGv_i64 t2 = tcg_temp_new_i64(); + TCGv_i64 t3 = tcg_temp_new_i64(); + + tcg_gen_andc_i64(t3, m, b); + tcg_gen_andc_i64(t2, b, m); + tcg_gen_sub_i64(d, m, t2); + tcg_gen_xor_i64(d, d, t3); + + tcg_temp_free_i64(t2); + tcg_temp_free_i64(t3); +} + +void tcg_gen_vec_neg8_i64(TCGv_i64 d, TCGv_i64 b) +{ + TCGv_i64 m = tcg_const_i64(REP8(0x80)); + gen_negv_mask(d, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec_neg16_i64(TCGv_i64 d, TCGv_i64 b) +{ + TCGv_i64 m = tcg_const_i64(REP16(0x8000)); + gen_negv_mask(d, b, m); + tcg_temp_free_i64(m); +} + +void tcg_gen_vec_neg32_i64(TCGv_i64 d, TCGv_i64 b) +{ + TCGv_i64 t1 = tcg_temp_new_i64(); + TCGv_i64 t2 = tcg_temp_new_i64(); + + tcg_gen_andi_i64(t1, b, ~0xffffffffull); + tcg_gen_neg_i64(t2, b); + tcg_gen_neg_i64(t1, t1); + tcg_gen_deposit_i64(d, t1, t2, 0, 32); + + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t2); +} + +void tcg_gen_gvec_neg(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz) +{ + static const GVecGen2 g[4] = { + { .fni8 = tcg_gen_vec_neg8_i64, + .fniv = tcg_gen_neg_vec, + .fno = gen_helper_gvec_neg8, + .opc = INDEX_op_neg_vec, + .vece = MO_8 }, + { .fni8 = tcg_gen_vec_neg16_i64, + .fniv = tcg_gen_neg_vec, + .fno = gen_helper_gvec_neg16, + .opc = INDEX_op_neg_vec, + .vece = MO_16 }, + { .fni4 = tcg_gen_neg_i32, + .fniv = tcg_gen_neg_vec, + .fno = gen_helper_gvec_neg32, + .opc = INDEX_op_neg_vec, + .vece = MO_32 }, + { .fni8 = tcg_gen_neg_i64, + .fniv = tcg_gen_neg_vec, + .fno = gen_helper_gvec_neg64, + .opc = INDEX_op_neg_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_2(dofs, aofs, oprsz, maxsz, &g[vece]); +} + +void tcg_gen_gvec_and(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz) +{ + static const GVecGen3 g = { + .fni8 = tcg_gen_and_i64, + .fniv = tcg_gen_and_vec, + .fno = gen_helper_gvec_and, + .opc = INDEX_op_and_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g); +} + +void tcg_gen_gvec_or(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz) +{ + static const GVecGen3 g = { + .fni8 = tcg_gen_or_i64, + .fniv = tcg_gen_or_vec, + .fno = gen_helper_gvec_or, + .opc = INDEX_op_or_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g); +} + +void tcg_gen_gvec_xor(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz) +{ + static const GVecGen3 g = { + .fni8 = tcg_gen_xor_i64, + .fniv = tcg_gen_xor_vec, + .fno = gen_helper_gvec_xor, + .opc = INDEX_op_xor_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g); +} + +void tcg_gen_gvec_andc(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz) +{ + static const GVecGen3 g = { + .fni8 = tcg_gen_andc_i64, + .fniv = tcg_gen_andc_vec, + .fno = gen_helper_gvec_andc, + .opc = INDEX_op_andc_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g); +} + +void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz) +{ + static const GVecGen3 g = { + .fni8 = tcg_gen_orc_i64, + .fniv = tcg_gen_orc_vec, + .fno = gen_helper_gvec_orc, + .opc = INDEX_op_orc_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + }; + tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g); +} diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c index dc04c11860..5cfe4af6bd 100644 --- a/tcg/tcg-op-vec.c +++ b/tcg/tcg-op-vec.c @@ -104,7 +104,7 @@ void tcg_gen_mov_vec(TCGv_vec r, TCGv_vec a) #define MO_REG (TCG_TARGET_REG_BITS == 64 ? MO_64 : MO_32) -static void tcg_gen_dupi_vec(TCGv_vec r, unsigned vece, TCGArg a) +static void do_dupi_vec(TCGv_vec r, unsigned vece, TCGArg a) { TCGTemp *rt = tcgv_vec_temp(r); vec_gen_2(INDEX_op_dupi_vec, rt->base_type, vece, temp_arg(rt), a); @@ -113,14 +113,14 @@ static void tcg_gen_dupi_vec(TCGv_vec r, unsigned vece, TCGArg a) TCGv_vec tcg_const_zeros_vec(TCGType type) { TCGv_vec ret = tcg_temp_new_vec(type); - tcg_gen_dupi_vec(ret, MO_REG, 0); + do_dupi_vec(ret, MO_REG, 0); return ret; } TCGv_vec tcg_const_ones_vec(TCGType type) { TCGv_vec ret = tcg_temp_new_vec(type); - tcg_gen_dupi_vec(ret, MO_REG, -1); + do_dupi_vec(ret, MO_REG, -1); return ret; } @@ -139,9 +139,9 @@ TCGv_vec tcg_const_ones_vec_matching(TCGv_vec m) void tcg_gen_dup64i_vec(TCGv_vec r, uint64_t a) { if (TCG_TARGET_REG_BITS == 32 && a == deposit64(a, 32, 32, a)) { - tcg_gen_dupi_vec(r, MO_32, a); + do_dupi_vec(r, MO_32, a); } else if (TCG_TARGET_REG_BITS == 64 || a == (uint64_t)(int32_t)a) { - tcg_gen_dupi_vec(r, MO_64, a); + do_dupi_vec(r, MO_64, a); } else { TCGv_i64 c = tcg_const_i64(a); tcg_gen_dup_i64_vec(MO_64, r, c); @@ -151,17 +151,37 @@ void tcg_gen_dup64i_vec(TCGv_vec r, uint64_t a) void tcg_gen_dup32i_vec(TCGv_vec r, uint32_t a) { - tcg_gen_dupi_vec(r, MO_REG, ((TCGArg)-1 / 0xffffffffu) * a); + do_dupi_vec(r, MO_REG, ((TCGArg)-1 / 0xffffffffu) * a); } void tcg_gen_dup16i_vec(TCGv_vec r, uint32_t a) { - tcg_gen_dupi_vec(r, MO_REG, ((TCGArg)-1 / 0xffff) * (a & 0xffff)); + do_dupi_vec(r, MO_REG, ((TCGArg)-1 / 0xffff) * (a & 0xffff)); } void tcg_gen_dup8i_vec(TCGv_vec r, uint32_t a) { - tcg_gen_dupi_vec(r, MO_REG, ((TCGArg)-1 / 0xff) * (a & 0xff)); + do_dupi_vec(r, MO_REG, ((TCGArg)-1 / 0xff) * (a & 0xff)); +} + +void tcg_gen_dupi_vec(unsigned vece, TCGv_vec r, uint64_t a) +{ + switch (vece) { + case MO_8: + tcg_gen_dup8i_vec(r, a); + break; + case MO_16: + tcg_gen_dup16i_vec(r, a); + break; + case MO_32: + tcg_gen_dup32i_vec(r, a); + break; + case MO_64: + tcg_gen_dup64i_vec(r, a); + break; + default: + g_assert_not_reached(); + } } void tcg_gen_movi_v64(TCGv_vec r, uint64_t a) diff --git a/tcg/tcg.c b/tcg/tcg.c index 16b8faf66f..b4f8938fb0 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1404,10 +1404,10 @@ bool tcg_op_supported(TCGOpcode op) case INDEX_op_orc_vec: return have_vec && TCG_TARGET_HAS_orc_vec; - case NB_OPS: - break; + default: + tcg_debug_assert(op > INDEX_op_last_generic && op < NB_OPS); + return true; } - g_assert_not_reached(); } /* Note: we convert the 64 bit args to 32 bit and do some alignment @@ -3737,3 +3737,10 @@ void tcg_register_jit(void *buf, size_t buf_size) { } #endif /* ELF_HOST_MACHINE */ + +#if !TCG_TARGET_MAYBE_vec +void tcg_expand_vec_op(TCGOpcode o, TCGType t, unsigned e, TCGArg a0, ...) +{ + g_assert_not_reached(); +} +#endif diff --git a/accel/tcg/Makefile.objs b/accel/tcg/Makefile.objs index 228cd84fa4..d381a02f34 100644 --- a/accel/tcg/Makefile.objs +++ b/accel/tcg/Makefile.objs @@ -1,6 +1,6 @@ obj-$(CONFIG_SOFTMMU) += tcg-all.o obj-$(CONFIG_SOFTMMU) += cputlb.o -obj-y += tcg-runtime.o +obj-y += tcg-runtime.o tcg-runtime-gvec.o obj-y += cpu-exec.o cpu-exec-common.o translate-all.o obj-y += translator.o From patchwork Sat Jan 6 03:13:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123601 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp121584qgn; Fri, 5 Jan 2018 19:14:34 -0800 (PST) X-Google-Smtp-Source: ACJfBou5sJFmHTIrKcAUUrS/2zlBfsftNQvTtUom8FAkk5UkkOS1xq0f1WeSZ6nI7YXgVguqsaO5 X-Received: by 10.129.94.7 with SMTP id s7mr4602596ywb.63.1515208474045; Fri, 05 Jan 2018 19:14:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515208474; cv=none; d=google.com; s=arc-20160816; b=ybnYB/RPWUlvp6TjQjVJBfMZgdDNojf+LyiqPrqGAxDLBO++ea84ZiDnCXHSisdV9n gbxAcx3RKMZjMC09SDuHviGt+T/rZ1gFAi8EU/6KrbCLVHMPE5Js1penDcRel9XxAoIj l2Ttq2Adg1QGyMw0JYfaw7qm5Gzr4r6ZiYtIt2HshsItHXzPJWO7+eaCmaBQ00rvRZYL sU2BqMSH3qwE+WJo81qtGuWDMKtNW25sl6LMBE8imaA2MafigJR/OGbcXDGh+UKLWkZc +kxYjJfUhiDMM8yUJRmNI9nY+I55Tzz0EnbIyLBSKKX6SRyA25+oNSnkUf3gHXQGn7d5 E1Fg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=Qr/mdYIXzaibD/aGf+TjVpNhfFLYB42cPYlYCbDvgnI=; b=1BiyaAwSD/lIPNSr4mhyI1umq30EeB8Ty36hZlZa86W+Z4B+r7v8uEO4rqTRce4grh tVE0oPSS0hOOkyO6yaRPgLNfrGCmmWKNa81BusQQD6+QWpq8gQgsEqKQCsBJwVEAB5lK Qg5OSKZq1sP146ySmNdYdX7IBzxYPmb6zDojEtz1eUn4dJ/WP27C3mvYHryplGwx0G/y MGMUUeG9XECWJ5AQW1Ks1voDMoOllh8KeRsn4wRbrzOPRAmy+jucSyc9iEzC6GP4Ge6G yMYBE3tgl7D/8Nb6bGS0AGG5Wllh9UEXO3ZxUbuwP/8q1L2FwfZR5SkZBIHNUPRfzPrn 2nRw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=agnM00sI; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id k10si507352ywk.423.2018.01.05.19.14.33 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:14:34 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=agnM00sI; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44024 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXewL-0007U8-E6 for patch@linaro.org; Fri, 05 Jan 2018 22:14:33 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48241) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXevr-0007RC-0E for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXevn-0004pk-Sc for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:02 -0500 Received: from mail-pg0-x244.google.com ([2607:f8b0:400e:c05::244]:32998) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXevn-0004oC-J2 for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:13:59 -0500 Received: by mail-pg0-x244.google.com with SMTP id i196so2716433pgd.0 for ; Fri, 05 Jan 2018 19:13:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=Qr/mdYIXzaibD/aGf+TjVpNhfFLYB42cPYlYCbDvgnI=; b=agnM00sIp9JlvAYZni3tOZnlrxaZHWgeFOJvk8QctmxFooRzP2NYPx3P4Swq7kcN3h MHYlTCVjotrjqzSTKPz/M3hAy1uBZ6h8YZLHZ0M/6eOR1t3uRfnTtR08U1saC7w18a94 vuYDYmqD5M/RitnTE7OymrrG+LUGJEhm6YEI4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=Qr/mdYIXzaibD/aGf+TjVpNhfFLYB42cPYlYCbDvgnI=; b=OuXt+DI67fhPADCgGc9/wqECHlVy1+gXUBzpEITW8zgXvA7nnNMIgY9R8yFOx0ydBa ICzYZS3GWQge3x8Ddm4TGRA220fCZ8kfDDKvIwMb3wnxlf4ueVVEq3AiFRKAcg8/77Nq jfe4J2ykn8nRrJXWpLcMz15i1M/f6aRJlSinc9O6FLvL2xeRm6V4DcBNTdKY74oDmO6R 5tu+aPtv/smTgan0Nr8BLwKScDZV9faeQb+cQP6x1f9iSbYlz8+7ENPuwwA7MbR4FuXX NcrcbHNP3N7ZjTaYOB5gQBkfZlxu8MgJgdhofQq8BFIRFT4yaaVS5JF9DzHdZ1jwnNbf 9aIg== X-Gm-Message-State: AKGB3mILUZkvJ6bLXuNinDshLEKn8XHfnqjc/ifd3pa3wdrJVE7yAgy2 G4afPcR+qjTzU4nvNMmuGxAFQ02FO4E= X-Received: by 10.98.210.133 with SMTP id c127mr4735823pfg.160.1515208437979; Fri, 05 Jan 2018 19:13:57 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.13.56 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:13:56 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:28 -0800 Message-Id: <20180106031346.6650-6-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::244 Subject: [Qemu-devel] [PATCH v8 05/23] tcg: Add generic vector ops for interleave X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Includes zip, unzip, and transform. Signed-off-by: Richard Henderson --- accel/tcg/tcg-runtime.h | 15 ++ tcg/tcg-op-gvec.h | 17 +++ tcg/tcg-op.h | 6 + tcg/tcg-opc.h | 7 + tcg/tcg.h | 3 + accel/tcg/tcg-runtime-gvec.c | 78 +++++++++++ tcg/tcg-op-gvec.c | 317 ++++++++++++++++++++++++++++++++++++++++++- tcg/tcg-op-vec.c | 55 ++++++++ tcg/tcg.c | 9 ++ tcg/README | 40 ++++++ 10 files changed, 545 insertions(+), 2 deletions(-) -- 2.14.3 diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h index 76ee41ce58..c6de749134 100644 --- a/accel/tcg/tcg-runtime.h +++ b/accel/tcg/tcg-runtime.h @@ -163,3 +163,18 @@ DEF_HELPER_FLAGS_4(gvec_or, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_xor, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_andc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_orc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_zip8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_zip16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_zip32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_zip64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_uzp8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_uzp16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_uzp32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_uzp64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_trn8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_trn16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_trn32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_trn64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h index 9d795fbb30..1d97e80e18 100644 --- a/tcg/tcg-op-gvec.h +++ b/tcg/tcg-op-gvec.h @@ -87,6 +87,8 @@ typedef struct { gen_helper_gvec_2 *fno; /* The opcode, if any, to which this corresponds. */ TCGOpcode opc; + /* The data argument to the out-of-line helper. */ + uint32_t data; /* The vector element size, if applicable. */ uint8_t vece; /* Prefer i64 to v64. */ @@ -104,6 +106,8 @@ typedef struct { gen_helper_gvec_3 *fno; /* The opcode, if any, to which this corresponds. */ TCGOpcode opc; + /* The data argument to the out-of-line helper. */ + uint32_t data; /* The vector element size, if applicable. */ uint8_t vece; /* Prefer i64 to v64. */ @@ -175,6 +179,19 @@ void tcg_gen_gvec_dup16i(uint32_t dofs, uint32_t s, uint32_t m, uint16_t x); void tcg_gen_gvec_dup32i(uint32_t dofs, uint32_t s, uint32_t m, uint32_t x); void tcg_gen_gvec_dup64i(uint32_t dofs, uint32_t s, uint32_t m, uint64_t x); +void tcg_gen_gvec_zipl(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_ziph(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_uzpe(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_uzpo(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_trne(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_trno(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz); + /* * 64-bit vector operations. Use these when the register has been allocated * with tcg_global_mem_new_i64, and so we cannot also address it via pointer. diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index 09fec5693f..87be11a90d 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -927,6 +927,12 @@ void tcg_gen_andc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_orc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a); void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a); +void tcg_gen_zipl_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_ziph_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_uzpe_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_uzpo_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_trne_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_trno_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_ld_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset); void tcg_gen_st_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset); diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index b4e16cfbc3..c911d62442 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -229,6 +229,13 @@ DEF(andc_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_andc_vec)) DEF(orc_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_orc_vec)) DEF(not_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_not_vec)) +DEF(zipl_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_zip_vec)) +DEF(ziph_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_zip_vec)) +DEF(uzpe_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_uzp_vec)) +DEF(uzpo_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_uzp_vec)) +DEF(trne_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_trn_vec)) +DEF(trno_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_trn_vec)) + DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT) #if TCG_TARGET_MAYBE_vec diff --git a/tcg/tcg.h b/tcg/tcg.h index 0e9d79bdd4..d0678ac769 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -178,6 +178,9 @@ typedef uint64_t TCGRegSet; #define TCG_TARGET_HAS_not_vec 0 #define TCG_TARGET_HAS_andc_vec 0 #define TCG_TARGET_HAS_orc_vec 0 +#define TCG_TARGET_HAS_zip_vec 0 +#define TCG_TARGET_HAS_uzp_vec 0 +#define TCG_TARGET_HAS_trn_vec 0 #else #define TCG_TARGET_MAYBE_vec 1 #endif diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c index cd1ce12b7e..628df811b2 100644 --- a/accel/tcg/tcg-runtime-gvec.c +++ b/accel/tcg/tcg-runtime-gvec.c @@ -293,3 +293,81 @@ void HELPER(gvec_orc)(void *d, void *a, void *b, uint32_t desc) } clear_high(d, oprsz, desc); } + +/* The size of the alloca in the following is currently bounded to 2k. */ + +#define DO_ZIP(NAME, TYPE) \ +void HELPER(NAME)(void *d, void *a, void *b, uint32_t desc) \ +{ \ + intptr_t oprsz = simd_oprsz(desc); \ + intptr_t oprsz_2 = oprsz / 2; \ + intptr_t i; \ + /* We produce output faster than we consume input. \ + Therefore we must be mindful of possible overlap. */ \ + if (unlikely((a - d) < (uintptr_t)oprsz)) { \ + void *a_new = alloca(oprsz_2); \ + memcpy(a_new, a, oprsz_2); \ + a = a_new; \ + } \ + if (unlikely((b - d) < (uintptr_t)oprsz)) { \ + void *b_new = alloca(oprsz_2); \ + memcpy(b_new, b, oprsz_2); \ + b = b_new; \ + } \ + for (i = 0; i < oprsz_2; i += sizeof(TYPE)) { \ + *(TYPE *)(d + 2 * i + 0) = *(TYPE *)(a + i); \ + *(TYPE *)(d + 2 * i + sizeof(TYPE)) = *(TYPE *)(b + i); \ + } \ + clear_high(d, oprsz, desc); \ +} + +DO_ZIP(gvec_zip8, uint8_t) +DO_ZIP(gvec_zip16, uint16_t) +DO_ZIP(gvec_zip32, uint32_t) +DO_ZIP(gvec_zip64, uint64_t) + +#define DO_UZP(NAME, TYPE) \ +void HELPER(NAME)(void *d, void *a, void *b, uint32_t desc) \ +{ \ + intptr_t oprsz = simd_oprsz(desc); \ + intptr_t oprsz_2 = oprsz / 2; \ + intptr_t odd_ofs = simd_data(desc); \ + intptr_t i; \ + if (unlikely((b - d) < (uintptr_t)oprsz)) { \ + void *b_new = alloca(oprsz); \ + memcpy(b_new, b, oprsz); \ + b = b_new; \ + } \ + for (i = 0; i < oprsz_2; i += sizeof(TYPE)) { \ + *(TYPE *)(d + i) = *(TYPE *)(a + 2 * i + odd_ofs); \ + } \ + for (i = 0; i < oprsz_2; i += sizeof(TYPE)) { \ + *(TYPE *)(d + oprsz_2 + i) = *(TYPE *)(b + 2 * i + odd_ofs); \ + } \ + clear_high(d, oprsz, desc); \ +} + +DO_UZP(gvec_uzp8, uint8_t) +DO_UZP(gvec_uzp16, uint16_t) +DO_UZP(gvec_uzp32, uint32_t) +DO_UZP(gvec_uzp64, uint64_t) + +#define DO_TRN(NAME, TYPE) \ +void HELPER(NAME)(void *d, void *a, void *b, uint32_t desc) \ +{ \ + intptr_t oprsz = simd_oprsz(desc); \ + intptr_t odd_ofs = simd_data(desc); \ + intptr_t i; \ + for (i = 0; i < oprsz; i += 2 * sizeof(TYPE)) { \ + TYPE ae = *(TYPE *)(a + i + odd_ofs); \ + TYPE be = *(TYPE *)(b + i + odd_ofs); \ + *(TYPE *)(d + i + 0) = ae; \ + *(TYPE *)(d + i + sizeof(TYPE)) = be; \ + } \ + clear_high(d, oprsz, desc); \ +} + +DO_TRN(gvec_trn8, uint8_t) +DO_TRN(gvec_trn16, uint16_t) +DO_TRN(gvec_trn32, uint32_t) +DO_TRN(gvec_trn64, uint64_t) diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index 2b7f4c8544..c4ecef0e1e 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -780,7 +780,7 @@ void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, } do_ool: - tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, 0, g->fno); + tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, g->data, g->fno); } /* Expand a vector three-operand operation. */ @@ -864,7 +864,7 @@ void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, } do_ool: - tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, 0, g->fno); + tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, g->data, g->fno); } /* Expand a vector four-operand operation. */ @@ -1401,3 +1401,316 @@ void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, uint32_t aofs, }; tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g); } + +static void do_zip(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz, + bool high) +{ + static gen_helper_gvec_3 * const zip_fns[4] = { + gen_helper_gvec_zip8, + gen_helper_gvec_zip16, + gen_helper_gvec_zip32, + gen_helper_gvec_zip64, + }; + + TCGType type; + uint32_t step, i, n; + TCGOpcode zip_op; + + check_size_align(oprsz, maxsz, dofs | aofs | bofs); + check_overlap_3(dofs, aofs, bofs, oprsz); + tcg_debug_assert(vece <= MO_64); + + /* Quick check for sizes we won't support inline. */ + if (oprsz > 4 * 32 || maxsz > MAX_UNROLL * 32) { + goto do_ool; + } + + zip_op = high ? INDEX_op_ziph_vec : INDEX_op_zipl_vec; + + /* Since these operations don't operate in lock-step lanes, + we must care for overlap. */ + if (TCG_TARGET_HAS_v256 && oprsz % 32 == 0 && oprsz / 32 <= 8 + && tcg_can_emit_vec_op(zip_op, TCG_TYPE_V256, vece)) { + type = TCG_TYPE_V256; + step = 32; + n = oprsz / 32; + } else if (TCG_TARGET_HAS_v128 && oprsz % 16 == 0 && oprsz / 16 <= 8 + && tcg_can_emit_vec_op(zip_op, TCG_TYPE_V128, vece)) { + type = TCG_TYPE_V128; + step = 16; + n = oprsz / 16; + } else if (TCG_TARGET_HAS_v64 && oprsz % 8 == 0 && oprsz / 8 <= 8 + && tcg_can_emit_vec_op(zip_op, TCG_TYPE_V64, vece)) { + type = TCG_TYPE_V64; + step = 8; + n = oprsz / 8; + } else { + goto do_ool; + } + + if (n == 1) { + TCGv_vec t1 = tcg_temp_new_vec(type); + TCGv_vec t2 = tcg_temp_new_vec(type); + + tcg_gen_ld_vec(t1, cpu_env, aofs); + tcg_gen_ld_vec(t2, cpu_env, bofs); + if (high) { + tcg_gen_ziph_vec(vece, t1, t1, t2); + } else { + tcg_gen_zipl_vec(vece, t1, t1, t2); + } + tcg_gen_st_vec(t1, cpu_env, dofs); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + } else { + TCGv_vec ta[4], tb[4], tmp; + + if (high) { + aofs += oprsz / 2; + bofs += oprsz / 2; + } + + for (i = 0; i < (n / 2 + n % 2); ++i) { + ta[i] = tcg_temp_new_vec(type); + tb[i] = tcg_temp_new_vec(type); + tcg_gen_ld_vec(ta[i], cpu_env, aofs + i * step); + tcg_gen_ld_vec(tb[i], cpu_env, bofs + i * step); + } + + tmp = tcg_temp_new_vec(type); + for (i = 0; i < n; ++i) { + if (i & 1) { + tcg_gen_ziph_vec(vece, tmp, ta[i / 2], tb[i / 2]); + } else { + tcg_gen_zipl_vec(vece, tmp, ta[i / 2], tb[i / 2]); + } + tcg_gen_st_vec(tmp, cpu_env, dofs + i * step); + } + tcg_temp_free_vec(tmp); + + for (i = 0; i < (n / 2 + n % 2); ++i) { + tcg_temp_free_vec(ta[i]); + tcg_temp_free_vec(tb[i]); + } + } + if (oprsz < maxsz) { + expand_clr(dofs + oprsz, maxsz - oprsz); + } + return; + + do_ool: + if (high) { + aofs += oprsz / 2; + bofs += oprsz / 2; + } + tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, 0, zip_fns[vece]); +} + +void tcg_gen_gvec_zipl(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz) +{ + do_zip(vece, dofs, aofs, bofs, oprsz, maxsz, false); +} + +void tcg_gen_gvec_ziph(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz) +{ + do_zip(vece, dofs, aofs, bofs, oprsz, maxsz, true); +} + +static void do_uzp(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz, bool odd) +{ + static gen_helper_gvec_3 * const uzp_fns[4] = { + gen_helper_gvec_uzp8, + gen_helper_gvec_uzp16, + gen_helper_gvec_uzp32, + gen_helper_gvec_uzp64, + }; + + TCGType type; + uint32_t step, i, n; + TCGv_vec t[8]; + TCGOpcode uzp_op; + + check_size_align(oprsz, maxsz, dofs | aofs | bofs); + check_overlap_3(dofs, aofs, bofs, oprsz); + tcg_debug_assert(vece <= MO_64); + + /* Quick check for sizes we won't support inline. */ + if (oprsz > 4 * 32 || maxsz > MAX_UNROLL * 32) { + goto do_ool; + } + + uzp_op = odd ? INDEX_op_uzpo_vec : INDEX_op_uzpe_vec; + + /* Since these operations don't operate in lock-step lanes, + we must care for overlap. */ + if (TCG_TARGET_HAS_v256 && oprsz % 32 == 0 && oprsz / 32 <= 4 + && tcg_can_emit_vec_op(uzp_op, TCG_TYPE_V256, vece)) { + type = TCG_TYPE_V256; + step = 32; + n = oprsz / 32; + } else if (TCG_TARGET_HAS_v128 && oprsz % 16 == 0 && oprsz / 16 <= 4 + && tcg_can_emit_vec_op(uzp_op, TCG_TYPE_V128, vece)) { + type = TCG_TYPE_V128; + step = 16; + n = oprsz / 16; + } else if (TCG_TARGET_HAS_v64 && oprsz % 8 == 0 && oprsz / 8 <= 4 + && tcg_can_emit_vec_op(uzp_op, TCG_TYPE_V64, vece)) { + type = TCG_TYPE_V64; + step = 8; + n = oprsz / 8; + } else { + goto do_ool; + } + + for (i = 0; i < n; ++i) { + t[i] = tcg_temp_new_vec(type); + tcg_gen_ld_vec(t[i], cpu_env, aofs + i * step); + } + for (i = 0; i < n; ++i) { + t[n + i] = tcg_temp_new_vec(type); + tcg_gen_ld_vec(t[n + i], cpu_env, bofs + i * step); + } + for (i = 0; i < n; ++i) { + if (odd) { + tcg_gen_uzpo_vec(vece, t[2 * i], t[2 * i], t[2 * i + 1]); + } else { + tcg_gen_uzpe_vec(vece, t[2 * i], t[2 * i], t[2 * i + 1]); + } + tcg_gen_st_vec(t[2 * i], cpu_env, dofs + i * step); + tcg_temp_free_vec(t[2 * i]); + tcg_temp_free_vec(t[2 * i + 1]); + } + if (oprsz < maxsz) { + expand_clr(dofs + oprsz, maxsz - oprsz); + } + return; + + do_ool: + tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, + (1 << vece) * odd, uzp_fns[vece]); +} + +void tcg_gen_gvec_uzpe(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz) +{ + do_uzp(vece, dofs, aofs, bofs, oprsz, maxsz, false); +} + +void tcg_gen_gvec_uzpo(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz) +{ + do_uzp(vece, dofs, aofs, bofs, oprsz, maxsz, true); +} + +static void gen_trne8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + uint64_t m = 0x00ff00ff00ff00ffull; + tcg_gen_andi_i64(a, a, m); + tcg_gen_andi_i64(b, b, m); + tcg_gen_shli_i64(b, b, 8); + tcg_gen_or_i64(d, a, b); +} + +static void gen_trne16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + uint64_t m = 0x0000ffff0000ffffull; + tcg_gen_andi_i64(a, a, m); + tcg_gen_andi_i64(b, b, m); + tcg_gen_shli_i64(b, b, 16); + tcg_gen_or_i64(d, a, b); +} + +static void gen_trne32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + tcg_gen_deposit_i64(d, a, b, 32, 32); +} + +void tcg_gen_gvec_trne(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz) +{ + static const GVecGen3 g[4] = { + { .fni8 = gen_trne8_i64, + .fniv = tcg_gen_trne_vec, + .fno = gen_helper_gvec_trn8, + .opc = INDEX_op_trne_vec, + .vece = MO_8 }, + { .fni8 = gen_trne16_i64, + .fniv = tcg_gen_trne_vec, + .fno = gen_helper_gvec_trn16, + .opc = INDEX_op_trne_vec, + .vece = MO_16 }, + { .fni8 = gen_trne32_i64, + .fniv = tcg_gen_trne_vec, + .fno = gen_helper_gvec_trn32, + .opc = INDEX_op_trne_vec, + .vece = MO_32 }, + { .fniv = tcg_gen_trne_vec, + .fno = gen_helper_gvec_trn64, + .opc = INDEX_op_trne_vec, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]); +} + +static void gen_trno8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + uint64_t m = 0xff00ff00ff00ff00ull; + tcg_gen_andi_i64(a, a, m); + tcg_gen_andi_i64(b, b, m); + tcg_gen_shri_i64(a, a, 8); + tcg_gen_or_i64(d, a, b); +} + +static void gen_trno16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + uint64_t m = 0xffff0000ffff0000ull; + tcg_gen_andi_i64(a, a, m); + tcg_gen_andi_i64(b, b, m); + tcg_gen_shri_i64(a, a, 16); + tcg_gen_or_i64(d, a, b); +} + +static void gen_trno32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + tcg_gen_shri_i64(a, a, 32); + tcg_gen_deposit_i64(d, b, a, 0, 32); +} + +void tcg_gen_gvec_trno(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz) +{ + static const GVecGen3 g[4] = { + { .fni8 = gen_trno8_i64, + .fniv = tcg_gen_trno_vec, + .fno = gen_helper_gvec_trn8, + .opc = INDEX_op_trno_vec, + .data = 1, + .vece = MO_8 }, + { .fni8 = gen_trno16_i64, + .fniv = tcg_gen_trno_vec, + .fno = gen_helper_gvec_trn16, + .opc = INDEX_op_trno_vec, + .data = 2, + .vece = MO_16 }, + { .fni8 = gen_trno32_i64, + .fniv = tcg_gen_trno_vec, + .fno = gen_helper_gvec_trn32, + .opc = INDEX_op_trno_vec, + .data = 4, + .vece = MO_32 }, + { .fniv = tcg_gen_trno_vec, + .fno = gen_helper_gvec_trn64, + .opc = INDEX_op_trno_vec, + .data = 8, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]); +} diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c index 5cfe4af6bd..a5d0ff89c3 100644 --- a/tcg/tcg-op-vec.c +++ b/tcg/tcg-op-vec.c @@ -380,3 +380,58 @@ void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a) tcg_temp_free_vec(t); } } + +static void do_interleave(TCGOpcode opc, unsigned vece, + TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + TCGTemp *rt = tcgv_vec_temp(r); + TCGTemp *at = tcgv_vec_temp(a); + TCGTemp *bt = tcgv_vec_temp(b); + TCGArg ri = temp_arg(rt); + TCGArg ai = temp_arg(at); + TCGArg bi = temp_arg(bt); + TCGType type = rt->base_type; + unsigned vecl = type - TCG_TYPE_V64; + int can; + + tcg_debug_assert(at->base_type == type); + tcg_debug_assert(bt->base_type == type); + tcg_debug_assert((8 << vece) <= (32 << vecl)); + can = tcg_can_emit_vec_op(opc, type, vece); + if (can > 0) { + vec_gen_3(opc, type, vece, ri, ai, bi); + } else { + tcg_debug_assert(can < 0); + tcg_expand_vec_op(opc, type, vece, ri, ai, bi); + } +} + +void tcg_gen_zipl_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + do_interleave(INDEX_op_zipl_vec, vece, r, a, b); +} + +void tcg_gen_ziph_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + do_interleave(INDEX_op_ziph_vec, vece, r, a, b); +} + +void tcg_gen_uzpe_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + do_interleave(INDEX_op_uzpe_vec, vece, r, a, b); +} + +void tcg_gen_uzpo_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + do_interleave(INDEX_op_uzpo_vec, vece, r, a, b); +} + +void tcg_gen_trne_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + do_interleave(INDEX_op_trne_vec, vece, r, a, b); +} + +void tcg_gen_trno_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + do_interleave(INDEX_op_trno_vec, vece, r, a, b); +} diff --git a/tcg/tcg.c b/tcg/tcg.c index b4f8938fb0..33e3c03cbc 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1403,6 +1403,15 @@ bool tcg_op_supported(TCGOpcode op) return have_vec && TCG_TARGET_HAS_andc_vec; case INDEX_op_orc_vec: return have_vec && TCG_TARGET_HAS_orc_vec; + case INDEX_op_zipl_vec: + case INDEX_op_ziph_vec: + return have_vec && TCG_TARGET_HAS_zip_vec; + case INDEX_op_uzpe_vec: + case INDEX_op_uzpo_vec: + return have_vec && TCG_TARGET_HAS_uzp_vec; + case INDEX_op_trne_vec: + case INDEX_op_trno_vec: + return have_vec && TCG_TARGET_HAS_trn_vec; default: tcg_debug_assert(op > INDEX_op_last_generic && op < NB_OPS); diff --git a/tcg/README b/tcg/README index e14990fb9b..8ab8d3ab7e 100644 --- a/tcg/README +++ b/tcg/README @@ -561,6 +561,46 @@ E.g. VECL=1 -> 64 << 1 -> v128, and VECE=2 -> 1 << 2 -> i32. Similarly, logical operations with and without compliment. Note that VECE is unused. +* zipl_vec v0, v1, v2 +* ziph_vec v0, v1, v2 + + "Zip" two vectors together, either the low half of v1/v2 or the high half. + The name comes from ARM ARM; the equivalent function in Intel terminology + is the less scrutable "punpck". The effect is + + part = ("high" ? VECL/VECE/2 : 0); + for (i = 0; i < VECL/VECE/2; ++i) { + v0[2i + 0] = v1[i + part]; + v0[2i + 1] = v2[i + part]; + } + +* uzpe_vec v0, v1, v2 +* uzpo_vec v0, v1, v2 + + "Unzip" two vectors, either the even elements or the odd elements. + If v1 and v2 are the result of zipl and ziph, this performs the + inverse operation. The effect is + + part = ("odd" ? 1 : 0) + for (i = 0; i < VECL/VECE/2; ++i) { + v0[i] = v1[2i + part]; + } + for (i = 0; i < VECL/VECE/2; ++i) { + v0[i + VECL/VECE/2] = v1[2i + part]; + } + +* trne_vec v0, v1, v2 +* trno_vec v1, v1, v2 + + "Transpose" two vectors, either the even elements or the odd elements. + The effect is + + part = ("odd" ? 1 : 0) + for (i = 0; i < VECL/VECE/2; ++i) { + v0[2i + 0] = v1[2i + part]; + v0[2i + 1] = v2[2i + part]; + } + ********* Note 1: Some shortcuts are defined when the last operand is known to be From patchwork Sat Jan 6 03:13:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123618 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp130273qgn; Fri, 5 Jan 2018 19:28:18 -0800 (PST) X-Google-Smtp-Source: ACJfBotiL6q+aaxEGcaeREbBFsOxOiat3YzQv2REGDpnFBFzXNAKwiVDS+MsbKMXQv0QIDbniXqr X-Received: by 10.37.187.65 with SMTP id b1mr4715541ybk.172.1515209298812; Fri, 05 Jan 2018 19:28:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515209298; cv=none; d=google.com; s=arc-20160816; b=YLxhwOT93SnA4pfP8BAtpGTLPSOgbaStBK4cSqC4oeMg2BpI3fQt2d5R+/FgPpY+a+ 1Sc/BsrCT6Da3RYZ037F7OFNr9spbirxWbNznkf3T4zBXBS5LQiLm8CNWpuhRI9eymgF U7ZsS0iWlu5qxcOTxDjmkVxGz2zg7ai6cwYDzZzVavf07owj3J1DCNFGVgMX0jSwtDc2 h5XJfn+LZJGnQNKo6YVxWyZ61YvBghIRUcypJMcoHp3nj3H2YboB8IanbDaOWqBC11AU OoN3zOxeEYCPiZiIwhkhDJJgisKsxTOh2FTuAoLEJlovDJU0pPJLnX447q7sCTMniV7r 6nPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=UhQ+K/8dMhQ+hc9amTD8O8f/RhDo7kiOfWpd55cwWCk=; b=bDeouPEhMf1h7RiFPIsO+FYgZS/IZ3hfqPCEmoIvaRMPl6C6iCcFAW+cT5Bv3wt93D nMp5bRx1UP4YMtC+cIOScyLmuyAcyieDCCBQ/4XVffRtG3OR42wrsTXZa6POp7KYfaFG jMd8X2s8yl5la0X6E6tqUsGwYcXZV4YXPbVQGTAgAvcisRZ+oALegcYWwhloMsjEshjw PrdG7fhMOnDBYIAONvtzma1vFBDeoWdeno5sjqObA8RRv+NX4YDzKp6gCgrQOgXdg0Hj H8n8DkPgDscO5IyXranr3zSu1y/CxFYw4oKk5qrfIENG9i0253YXRZx+WXss/0lR2BrF JYbg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=gJlsR0U0; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id w18si1479021ybl.446.2018.01.05.19.28.18 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:28:18 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=gJlsR0U0; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44106 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXf9e-0001jE-6x for patch@linaro.org; Fri, 05 Jan 2018 22:28:18 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48272) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXevu-0007TY-23 for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:09 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXevp-0004vT-UR for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:06 -0500 Received: from mail-pg0-x241.google.com ([2607:f8b0:400e:c05::241]:32996) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXevp-0004sU-Js for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:01 -0500 Received: by mail-pg0-x241.google.com with SMTP id i196so2716454pgd.0 for ; Fri, 05 Jan 2018 19:14:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=UhQ+K/8dMhQ+hc9amTD8O8f/RhDo7kiOfWpd55cwWCk=; b=gJlsR0U0zaWgQnuXqWbXWUL1zWAxcorueTfLJ59KJ2QsZsjnhlFvzukjv0lKsFl+4v +wGt0FtFa0YP03M5pTAHJI6M1t7XT+5j8Rmiummyqc8Az0n2KVs2ODqA4Z9OF1O5EAPq 7VWRHa9rUJ373VGp5lsWl3JpxL2sQ7Vqv82Jo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=UhQ+K/8dMhQ+hc9amTD8O8f/RhDo7kiOfWpd55cwWCk=; b=l2aXavNAYzdLVHmoBU8K/pk9iPJhU5Zzd/g8JnW5Btq4+7zDiuNrS115QFMmyE+N3f 3UPHzdc/ENL3YT7gdYMHLFR5R9N598/iEtq++0zm7nh1P3J1zheAwswik60BShkh+vL0 +57/4b2zTd+/dp2+QAvLR3bgDONmL/VZToHX58Sk0iyFHZI9ZyMk/8uKPMtUthzQybjv nFdvFD05or8X1okq6ZdRIrBuFBaSP1fwdWHJicak9Ke+TnTrYSuomBUBfvi8ZaduBhMJ qAZbYIHBXnoKBTB7cxy+lZBRlM+o6syQ7N4iMiRmoexatGoAvIyZpKoVpohvrKrmuB0s OQow== X-Gm-Message-State: AKGB3mKVSHmQyPkuzPyvNuN6u6dSItsVI3///UFf7hriFXQTnTDLdpoG 3cXKS3ATP6q31vYo5E4pm6OR019ewXY= X-Received: by 10.98.161.21 with SMTP id b21mr4753431pff.113.1515208439925; Fri, 05 Jan 2018 19:13:59 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.13.58 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:13:58 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:29 -0800 Message-Id: <20180106031346.6650-7-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::241 Subject: [Qemu-devel] [PATCH v8 06/23] tcg: Add generic vector ops for constant shifts X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Opcodes are added for scalar and vector shifts, but considering the varied semantics of these do not expose them to the front ends. Do go ahead and provide them in case they are needed for backend expansion. Signed-off-by: Richard Henderson --- accel/tcg/tcg-runtime.h | 15 ++ tcg/tcg-op-gvec.h | 35 +++++ tcg/tcg-op.h | 5 + tcg/tcg-opc.h | 12 ++ tcg/tcg.h | 3 + accel/tcg/tcg-runtime-gvec.c | 149 ++++++++++++++++++++ tcg/tcg-op-gvec.c | 318 ++++++++++++++++++++++++++++++++++++++++++- tcg/tcg-op-vec.c | 45 ++++++ tcg/tcg.c | 12 ++ tcg/README | 29 ++++ 10 files changed, 617 insertions(+), 6 deletions(-) -- 2.14.3 diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h index c6de749134..cb05a755b8 100644 --- a/accel/tcg/tcg-runtime.h +++ b/accel/tcg/tcg-runtime.h @@ -164,6 +164,21 @@ DEF_HELPER_FLAGS_4(gvec_xor, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_andc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_orc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_shl8i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_shl16i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_shl32i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_shl64i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(gvec_shr8i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_shr16i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_shr32i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_shr64i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(gvec_sar8i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_sar16i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_sar32i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_sar64i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(gvec_zip8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_zip16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_zip32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h index 1d97e80e18..7ca78853f2 100644 --- a/tcg/tcg-op-gvec.h +++ b/tcg/tcg-op-gvec.h @@ -95,6 +95,25 @@ typedef struct { bool prefer_i64; } GVecGen2; +typedef struct { + /* Expand inline as a 64-bit or 32-bit integer. + Only one of these will be non-NULL. */ + void (*fni8)(TCGv_i64, TCGv_i64, int64_t); + void (*fni4)(TCGv_i32, TCGv_i32, int32_t); + /* Expand inline with a host vector type. */ + void (*fniv)(unsigned, TCGv_vec, TCGv_vec, int64_t); + /* Expand out-of-line helper w/descriptor. */ + gen_helper_gvec_2 *fno; + /* The opcode, if any, to which this corresponds. */ + TCGOpcode opc; + /* The vector element size, if applicable. */ + uint8_t vece; + /* Prefer i64 to v64. */ + bool prefer_i64; + /* Load dest as a 3rd source operand. */ + bool load_dest; +} GVecGen2i; + typedef struct { /* Expand inline as a 64-bit or 32-bit integer. Only one of these will be non-NULL. */ @@ -137,6 +156,8 @@ typedef struct { void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, uint32_t oprsz, uint32_t maxsz, const GVecGen2 *); +void tcg_gen_gvec_2i(uint32_t dofs, uint32_t aofs, uint32_t oprsz, + uint32_t maxsz, int64_t c, const GVecGen2i *); void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz, const GVecGen3 *); void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs, @@ -179,6 +200,13 @@ void tcg_gen_gvec_dup16i(uint32_t dofs, uint32_t s, uint32_t m, uint16_t x); void tcg_gen_gvec_dup32i(uint32_t dofs, uint32_t s, uint32_t m, uint32_t x); void tcg_gen_gvec_dup64i(uint32_t dofs, uint32_t s, uint32_t m, uint64_t x); +void tcg_gen_gvec_shli(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, unsigned shift); +void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, unsigned shift); +void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, unsigned shift); + void tcg_gen_gvec_zipl(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz); void tcg_gen_gvec_ziph(unsigned vece, uint32_t dofs, uint32_t aofs, @@ -209,3 +237,10 @@ void tcg_gen_vec_add32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); void tcg_gen_vec_sub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); void tcg_gen_vec_sub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); void tcg_gen_vec_sub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); + +void tcg_gen_vec_shl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t); +void tcg_gen_vec_shl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t); +void tcg_gen_vec_shr8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t); +void tcg_gen_vec_shr16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t); +void tcg_gen_vec_sar8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t); +void tcg_gen_vec_sar16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t); diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index 87be11a90d..c002523bea 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -927,6 +927,11 @@ void tcg_gen_andc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_orc_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a); void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a); + +void tcg_gen_shli_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i); +void tcg_gen_shri_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i); +void tcg_gen_sari_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i); + void tcg_gen_zipl_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_ziph_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_uzpe_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index c911d62442..a085fc077b 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -229,6 +229,18 @@ DEF(andc_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_andc_vec)) DEF(orc_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_orc_vec)) DEF(not_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_not_vec)) +DEF(shli_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_shi_vec)) +DEF(shri_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_shi_vec)) +DEF(sari_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_shi_vec)) + +DEF(shls_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shs_vec)) +DEF(shrs_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shs_vec)) +DEF(sars_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shs_vec)) + +DEF(shlv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec)) +DEF(shrv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec)) +DEF(sarv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec)) + DEF(zipl_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_zip_vec)) DEF(ziph_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_zip_vec)) DEF(uzpe_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_uzp_vec)) diff --git a/tcg/tcg.h b/tcg/tcg.h index d0678ac769..bdfe058c45 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -178,6 +178,9 @@ typedef uint64_t TCGRegSet; #define TCG_TARGET_HAS_not_vec 0 #define TCG_TARGET_HAS_andc_vec 0 #define TCG_TARGET_HAS_orc_vec 0 +#define TCG_TARGET_HAS_shi_vec 0 +#define TCG_TARGET_HAS_shs_vec 0 +#define TCG_TARGET_HAS_shv_vec 0 #define TCG_TARGET_HAS_zip_vec 0 #define TCG_TARGET_HAS_uzp_vec 0 #define TCG_TARGET_HAS_trn_vec 0 diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c index 628df811b2..fba62f1192 100644 --- a/accel/tcg/tcg-runtime-gvec.c +++ b/accel/tcg/tcg-runtime-gvec.c @@ -36,6 +36,11 @@ typedef uint16_t vec16 __attribute__((vector_size(16))); typedef uint32_t vec32 __attribute__((vector_size(16))); typedef uint64_t vec64 __attribute__((vector_size(16))); +typedef int8_t svec8 __attribute__((vector_size(16))); +typedef int16_t svec16 __attribute__((vector_size(16))); +typedef int32_t svec32 __attribute__((vector_size(16))); +typedef int64_t svec64 __attribute__((vector_size(16))); + static inline void clear_high(void *d, intptr_t oprsz, uint32_t desc) { intptr_t maxsz = simd_maxsz(desc); @@ -294,6 +299,150 @@ void HELPER(gvec_orc)(void *d, void *a, void *b, uint32_t desc) clear_high(d, oprsz, desc); } +void HELPER(gvec_shl8i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(vec8 *)(d + i) = *(vec8 *)(a + i) << shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_shl16i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(vec16 *)(d + i) = *(vec16 *)(a + i) << shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_shl32i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(vec32 *)(d + i) = *(vec32 *)(a + i) << shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_shl64i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) << shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_shr8i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(vec8 *)(d + i) = *(vec8 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_shr16i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(vec16 *)(d + i) = *(vec16 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_shr32i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(vec32 *)(d + i) = *(vec32 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_shr64i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sar8i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(svec8 *)(d + i) = *(svec8 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sar16i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(svec16 *)(d + i) = *(svec16 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sar32i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(svec32 *)(d + i) = *(svec32 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sar64i)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + int shift = simd_data(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(svec64 *)(d + i) = *(svec64 *)(a + i) >> shift; + } + clear_high(d, oprsz, desc); +} + /* The size of the alloca in the following is currently bounded to 2k. */ #define DO_ZIP(NAME, TYPE) \ diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index c4ecef0e1e..0ba61dd049 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -528,6 +528,26 @@ static void expand_2_i32(uint32_t dofs, uint32_t aofs, uint32_t oprsz, tcg_temp_free_i32(t0); } +static void expand_2i_i32(uint32_t dofs, uint32_t aofs, uint32_t oprsz, + int32_t c, bool load_dest, + void (*fni)(TCGv_i32, TCGv_i32, int32_t)) +{ + TCGv_i32 t0 = tcg_temp_new_i32(); + TCGv_i32 t1 = tcg_temp_new_i32(); + uint32_t i; + + for (i = 0; i < oprsz; i += 4) { + tcg_gen_ld_i32(t0, cpu_env, aofs + i); + if (load_dest) { + tcg_gen_ld_i32(t1, cpu_env, dofs + i); + } + fni(t1, t0, c); + tcg_gen_st_i32(t1, cpu_env, dofs + i); + } + tcg_temp_free_i32(t0); + tcg_temp_free_i32(t1); +} + /* Expand OPSZ bytes worth of three-operand operations using i32 elements. */ static void expand_3_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, bool load_dest, @@ -554,7 +574,7 @@ static void expand_3_i32(uint32_t dofs, uint32_t aofs, /* Expand OPSZ bytes worth of three-operand operations using i32 elements. */ static void expand_4_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs, - uint32_t cofs, uint32_t opsz, + uint32_t cofs, uint32_t oprsz, void (*fni)(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32)) { TCGv_i32 t0 = tcg_temp_new_i32(); @@ -563,7 +583,7 @@ static void expand_4_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs, TCGv_i32 t3 = tcg_temp_new_i32(); uint32_t i; - for (i = 0; i < opsz; i += 4) { + for (i = 0; i < oprsz; i += 4) { tcg_gen_ld_i32(t1, cpu_env, aofs + i); tcg_gen_ld_i32(t2, cpu_env, bofs + i); tcg_gen_ld_i32(t3, cpu_env, cofs + i); @@ -591,6 +611,26 @@ static void expand_2_i64(uint32_t dofs, uint32_t aofs, uint32_t oprsz, tcg_temp_free_i64(t0); } +static void expand_2i_i64(uint32_t dofs, uint32_t aofs, uint32_t oprsz, + int64_t c, bool load_dest, + void (*fni)(TCGv_i64, TCGv_i64, int64_t)) +{ + TCGv_i64 t0 = tcg_temp_new_i64(); + TCGv_i64 t1 = tcg_temp_new_i64(); + uint32_t i; + + for (i = 0; i < oprsz; i += 8) { + tcg_gen_ld_i64(t0, cpu_env, aofs + i); + if (load_dest) { + tcg_gen_ld_i64(t1, cpu_env, dofs + i); + } + fni(t1, t0, c); + tcg_gen_st_i64(t1, cpu_env, dofs + i); + } + tcg_temp_free_i64(t0); + tcg_temp_free_i64(t1); +} + /* Expand OPSZ bytes worth of three-operand operations using i64 elements. */ static void expand_3_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, bool load_dest, @@ -617,7 +657,7 @@ static void expand_3_i64(uint32_t dofs, uint32_t aofs, /* Expand OPSZ bytes worth of three-operand operations using i64 elements. */ static void expand_4_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs, - uint32_t cofs, uint32_t opsz, + uint32_t cofs, uint32_t oprsz, void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64)) { TCGv_i64 t0 = tcg_temp_new_i64(); @@ -626,7 +666,7 @@ static void expand_4_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs, TCGv_i64 t3 = tcg_temp_new_i64(); uint32_t i; - for (i = 0; i < opsz; i += 8) { + for (i = 0; i < oprsz; i += 8) { tcg_gen_ld_i64(t1, cpu_env, aofs + i); tcg_gen_ld_i64(t2, cpu_env, bofs + i); tcg_gen_ld_i64(t3, cpu_env, cofs + i); @@ -655,6 +695,29 @@ static void expand_2_vec(unsigned vece, uint32_t dofs, uint32_t aofs, tcg_temp_free_vec(t0); } +/* Expand OPSZ bytes worth of two-vector operands and an immediate operand + using host vectors. */ +static void expand_2i_vec(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t tysz, TCGType type, + int64_t c, bool load_dest, + void (*fni)(unsigned, TCGv_vec, TCGv_vec, int64_t)) +{ + TCGv_vec t0 = tcg_temp_new_vec(type); + TCGv_vec t1 = tcg_temp_new_vec(type); + uint32_t i; + + for (i = 0; i < oprsz; i += tysz) { + tcg_gen_ld_vec(t0, cpu_env, aofs + i); + if (load_dest) { + tcg_gen_ld_vec(t1, cpu_env, dofs + i); + } + fni(vece, t1, t0, c); + tcg_gen_st_vec(t1, cpu_env, dofs + i); + } + tcg_temp_free_vec(t0); + tcg_temp_free_vec(t1); +} + /* Expand OPSZ bytes worth of three-operand operations using host vectors. */ static void expand_3_vec(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, @@ -682,7 +745,7 @@ static void expand_3_vec(unsigned vece, uint32_t dofs, uint32_t aofs, /* Expand OPSZ bytes worth of four-operand operations using host vectors. */ static void expand_4_vec(unsigned vece, uint32_t dofs, uint32_t aofs, - uint32_t bofs, uint32_t cofs, uint32_t opsz, + uint32_t bofs, uint32_t cofs, uint32_t oprsz, uint32_t tysz, TCGType type, void (*fni)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec, TCGv_vec)) @@ -693,7 +756,7 @@ static void expand_4_vec(unsigned vece, uint32_t dofs, uint32_t aofs, TCGv_vec t3 = tcg_temp_new_vec(type); uint32_t i; - for (i = 0; i < opsz; i += tysz) { + for (i = 0; i < oprsz; i += tysz) { tcg_gen_ld_vec(t1, cpu_env, aofs + i); tcg_gen_ld_vec(t2, cpu_env, bofs + i); tcg_gen_ld_vec(t3, cpu_env, cofs + i); @@ -783,6 +846,85 @@ void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs, tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, g->data, g->fno); } +void tcg_gen_gvec_2i(uint32_t dofs, uint32_t aofs, uint32_t oprsz, + uint32_t maxsz, int64_t c, const GVecGen2i *g) +{ + check_size_align(oprsz, maxsz, dofs | aofs); + check_overlap_2(dofs, aofs, maxsz); + + /* Quick check for sizes we won't support inline. */ + if (oprsz > MAX_UNROLL * 32 || maxsz > MAX_UNROLL * 32) { + goto do_ool; + } + + /* Recall that ARM SVE allows vector sizes that are not a power of 2. + Expand with successively smaller host vector sizes. The intent is + that e.g. oprsz == 80 would be expanded with 2x32 + 1x16. */ + /* ??? For maxsz > oprsz, the host may be able to use an op-sized + operation, zeroing the balance of the register. We can then + use a cl-sized store to implement the clearing without an extra + store operation. This is true for aarch64 and x86_64 hosts. */ + + if (TCG_TARGET_HAS_v256 && g->fniv && check_size_impl(oprsz, 32) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V256, g->vece))) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 32); + expand_2i_vec(g->vece, dofs, aofs, done, 32, TCG_TYPE_V256, + c, g->load_dest, g->fniv); + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (TCG_TARGET_HAS_v128 && g->fniv && check_size_impl(oprsz, 16) + && (!g->opc || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V128, g->vece))) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 16); + expand_2i_vec(g->vece, dofs, aofs, done, 16, TCG_TYPE_V128, + c, g->load_dest, g->fniv); + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (check_size_impl(oprsz, 8)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 8); + if (TCG_TARGET_HAS_v64 && g->fniv && !g->prefer_i64 + && (!g->opc + || tcg_can_emit_vec_op(g->opc, TCG_TYPE_V64, g->vece))) { + expand_2i_vec(g->vece, dofs, aofs, done, 8, TCG_TYPE_V64, + c, g->load_dest, g->fniv); + } else if (g->fni8) { + expand_2i_i64(dofs, aofs, done, c, g->load_dest, g->fni8); + } else { + done = 0; + } + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (g->fni4 && check_size_impl(oprsz, 4)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 4); + expand_2i_i32(dofs, aofs, done, c, g->load_dest, g->fni4); + dofs += done; + aofs += done; + oprsz -= done; + maxsz -= done; + } + + if (oprsz == 0) { + if (maxsz != 0) { + expand_clr(dofs, maxsz); + } + return; + } + + do_ool: + tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, c, g->fno); +} + /* Expand a vector three-operand operation. */ void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz, const GVecGen3 *g) @@ -1402,6 +1544,170 @@ void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, uint32_t aofs, tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g); } +void tcg_gen_vec_shl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c) +{ + uint64_t mask = ((0xff << c) & 0xff) * (-1ull / 0xff); + tcg_gen_shli_i64(d, a, c); + tcg_gen_andi_i64(d, d, mask); +} + +void tcg_gen_vec_shl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c) +{ + uint64_t mask = ((0xffff << c) & 0xffff) * (-1ull / 0xffff); + tcg_gen_shli_i64(d, a, c); + tcg_gen_andi_i64(d, d, mask); +} + +void tcg_gen_gvec_shli(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, unsigned shift) +{ + static const GVecGen2i g[4] = { + { .fni8 = tcg_gen_vec_shl8i_i64, + .fniv = tcg_gen_shli_vec, + .fno = gen_helper_gvec_shl8i, + .opc = INDEX_op_shli_vec, + .vece = MO_8 }, + { .fni8 = tcg_gen_vec_shl16i_i64, + .fniv = tcg_gen_shli_vec, + .fno = gen_helper_gvec_shl16i, + .opc = INDEX_op_shli_vec, + .vece = MO_16 }, + { .fni4 = tcg_gen_shli_i32, + .fniv = tcg_gen_shli_vec, + .fno = gen_helper_gvec_shl32i, + .opc = INDEX_op_shli_vec, + .vece = MO_32 }, + { .fni8 = tcg_gen_shli_i64, + .fniv = tcg_gen_shli_vec, + .fno = gen_helper_gvec_shl64i, + .opc = INDEX_op_shli_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_debug_assert(shift < (8 << vece)); + if (shift == 0) { + tcg_gen_gvec_mov(vece, dofs, aofs, oprsz, maxsz); + } else { + tcg_gen_gvec_2i(dofs, aofs, oprsz, maxsz, shift, &g[vece]); + } +} + +void tcg_gen_vec_shr8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c) +{ + uint64_t mask = (0xff >> c) * (-1ull / 0xff); + tcg_gen_shri_i64(d, a, c); + tcg_gen_andi_i64(d, d, mask); +} + +void tcg_gen_vec_shr16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c) +{ + uint64_t mask = (0xffff >> c) * (-1ull / 0xffff); + tcg_gen_shri_i64(d, a, c); + tcg_gen_andi_i64(d, d, mask); +} + +void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, unsigned shift) +{ + static const GVecGen2i g[4] = { + { .fni8 = tcg_gen_vec_shr8i_i64, + .fniv = tcg_gen_shri_vec, + .fno = gen_helper_gvec_shr8i, + .opc = INDEX_op_shri_vec, + .vece = MO_8 }, + { .fni8 = tcg_gen_vec_shr16i_i64, + .fniv = tcg_gen_shri_vec, + .fno = gen_helper_gvec_shr16i, + .opc = INDEX_op_shri_vec, + .vece = MO_16 }, + { .fni4 = tcg_gen_shri_i32, + .fniv = tcg_gen_shri_vec, + .fno = gen_helper_gvec_shr32i, + .opc = INDEX_op_shri_vec, + .vece = MO_32 }, + { .fni8 = tcg_gen_shri_i64, + .fniv = tcg_gen_shri_vec, + .fno = gen_helper_gvec_shr64i, + .opc = INDEX_op_shri_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_debug_assert(shift < (8 << vece)); + if (shift == 0) { + tcg_gen_gvec_mov(vece, dofs, aofs, oprsz, maxsz); + } else { + tcg_gen_gvec_2i(dofs, aofs, oprsz, maxsz, shift, &g[vece]); + } +} + +void tcg_gen_vec_sar8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c) +{ + uint64_t s_mask = (0x80 >> c) * (-1ull / 0xff); + uint64_t c_mask = (0xff >> c) * (-1ull / 0xff); + TCGv_i64 s = tcg_temp_new_i64(); + + tcg_gen_shri_i64(d, a, c); + tcg_gen_andi_i64(s, d, s_mask); /* isolate (shifted) sign bit */ + tcg_gen_muli_i64(s, s, (2 << c) - 2); /* replicate isolated signs */ + tcg_gen_andi_i64(d, d, c_mask); /* clear out bits above sign */ + tcg_gen_or_i64(d, d, s); /* include sign extension */ + tcg_temp_free_i64(s); +} + +void tcg_gen_vec_sar16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c) +{ + uint64_t s_mask = (0x8000 >> c) * (-1ull / 0xffff); + uint64_t c_mask = (0xffff >> c) * (-1ull / 0xffff); + TCGv_i64 s = tcg_temp_new_i64(); + + tcg_gen_shri_i64(d, a, c); + tcg_gen_andi_i64(s, d, s_mask); /* isolate (shifted) sign bit */ + tcg_gen_andi_i64(d, d, c_mask); /* clear out bits above sign */ + tcg_gen_muli_i64(s, s, (2 << c) - 2); /* replicate isolated signs */ + tcg_gen_or_i64(d, d, s); /* include sign extension */ + tcg_temp_free_i64(s); +} + +void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, unsigned shift) +{ + static const GVecGen2i g[4] = { + { .fni8 = tcg_gen_vec_sar8i_i64, + .fniv = tcg_gen_sari_vec, + .fno = gen_helper_gvec_sar8i, + .opc = INDEX_op_sari_vec, + .vece = MO_8 }, + { .fni8 = tcg_gen_vec_sar16i_i64, + .fniv = tcg_gen_sari_vec, + .fno = gen_helper_gvec_sar16i, + .opc = INDEX_op_sari_vec, + .vece = MO_16 }, + { .fni4 = tcg_gen_sari_i32, + .fniv = tcg_gen_sari_vec, + .fno = gen_helper_gvec_sar32i, + .opc = INDEX_op_sari_vec, + .vece = MO_32 }, + { .fni8 = tcg_gen_sari_i64, + .fniv = tcg_gen_sari_vec, + .fno = gen_helper_gvec_sar64i, + .opc = INDEX_op_sari_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_debug_assert(shift < (8 << vece)); + if (shift == 0) { + tcg_gen_gvec_mov(vece, dofs, aofs, oprsz, maxsz); + } else { + tcg_gen_gvec_2i(dofs, aofs, oprsz, maxsz, shift, &g[vece]); + } +} + static void do_zip(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz, bool high) diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c index a5d0ff89c3..b54d3de19f 100644 --- a/tcg/tcg-op-vec.c +++ b/tcg/tcg-op-vec.c @@ -381,6 +381,51 @@ void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a) } } +static void do_shifti(TCGOpcode opc, unsigned vece, + TCGv_vec r, TCGv_vec a, int64_t i) +{ + TCGTemp *rt = tcgv_vec_temp(r); + TCGTemp *at = tcgv_vec_temp(a); + TCGArg ri = temp_arg(rt); + TCGArg ai = temp_arg(at); + TCGType type = rt->base_type; + int can; + + tcg_debug_assert(at->base_type == type); + tcg_debug_assert(i >= 0 && i < (8 << vece)); + + if (i == 0) { + tcg_gen_mov_vec(r, a); + return; + } + + can = tcg_can_emit_vec_op(opc, type, vece); + if (can > 0) { + vec_gen_3(opc, type, vece, ri, ai, i); + } else { + /* We leave the choice of expansion via scalar or vector shift + to the target. Often, but not always, dupi can feed a vector + shift easier than a scalar. */ + tcg_debug_assert(can < 0); + tcg_expand_vec_op(opc, type, vece, ri, ai, i); + } +} + +void tcg_gen_shli_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i) +{ + do_shifti(INDEX_op_shli_vec, vece, r, a, i); +} + +void tcg_gen_shri_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i) +{ + do_shifti(INDEX_op_shri_vec, vece, r, a, i); +} + +void tcg_gen_sari_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i) +{ + do_shifti(INDEX_op_sari_vec, vece, r, a, i); +} + static void do_interleave(TCGOpcode opc, unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) { diff --git a/tcg/tcg.c b/tcg/tcg.c index 33e3c03cbc..f82d6e80b0 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1403,6 +1403,18 @@ bool tcg_op_supported(TCGOpcode op) return have_vec && TCG_TARGET_HAS_andc_vec; case INDEX_op_orc_vec: return have_vec && TCG_TARGET_HAS_orc_vec; + case INDEX_op_shli_vec: + case INDEX_op_shri_vec: + case INDEX_op_sari_vec: + return have_vec && TCG_TARGET_HAS_shi_vec; + case INDEX_op_shls_vec: + case INDEX_op_shrs_vec: + case INDEX_op_sars_vec: + return have_vec && TCG_TARGET_HAS_shs_vec; + case INDEX_op_shlv_vec: + case INDEX_op_shrv_vec: + case INDEX_op_sarv_vec: + return have_vec && TCG_TARGET_HAS_shv_vec; case INDEX_op_zipl_vec: case INDEX_op_ziph_vec: return have_vec && TCG_TARGET_HAS_zip_vec; diff --git a/tcg/README b/tcg/README index 8ab8d3ab7e..75db47922d 100644 --- a/tcg/README +++ b/tcg/README @@ -561,6 +561,35 @@ E.g. VECL=1 -> 64 << 1 -> v128, and VECE=2 -> 1 << 2 -> i32. Similarly, logical operations with and without compliment. Note that VECE is unused. +* shli_vec v0, v1, i2 +* shls_vec v0, v1, s2 + + Shift all elements from v1 by a scalar i2/s2. I.e. + + for (i = 0; i < VECL/VECE; ++i) { + v0[i] = v1[i] << s2; + } + +* shri_vec v0, v1, i2 +* sari_vec v0, v1, i2 +* shrs_vec v0, v1, s2 +* sars_vec v0, v1, s2 + + Similarly for logical and arithmetic right shift. + +* shlv_vec v0, v1, v2 + + Shift elements from v1 by elements from v2. I.e. + + for (i = 0; i < VECL/VECE; ++i) { + v0[i] = v1[i] << v2[i]; + } + +* shrv_vec v0, v1, v2 +* sarv_vec v0, v1, v2 + + Similarly for logical and arithmetic right shift. + * zipl_vec v0, v1, v2 * ziph_vec v0, v1, v2 From patchwork Sat Jan 6 03:13:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123611 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp127390qgn; Fri, 5 Jan 2018 19:23:17 -0800 (PST) X-Google-Smtp-Source: ACJfBouEQRjSk2NoPa8lRp+OeVBbBQwzxpm+RUvmSmMYSy7n2rKGva0VnXgOt+WmrSlzomnhjVHG X-Received: by 10.129.128.131 with SMTP id q125mr4615105ywf.357.1515208997129; Fri, 05 Jan 2018 19:23:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515208997; cv=none; d=google.com; s=arc-20160816; b=WCGiAPTuQrAv8qAcRpVuQT2Eprsc3nV5SnVZVI5LIpBJmF4GF8DLwG73JwHeJ3F7EU ZWqg9CNpR3I5jq0ml4cgxeZFgb4PwfSuUqs3EczeESEn+NVdJ0bAg2TyFRJ0yBnQ8lGG V+C3yT4NejAgxeMRDLlmUkWhnXWFd2MOb199df5Y9pv6rQhGhffH0DuS7QukNK0TPbX7 GAfwG3DSZBP5039yGB+CxCYE29boLl2wfLfwRd/DyglcE8cjvnm/22Xn1wLnPXmwtxTh UhiZovJYvsIg5Xk3JlgIBbGUslReHCJL2QfllM+Ah0NJvh9+18Qb6AjwZbFgnrq/Mouu QCfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=hEPQ2EScbO0DclpqIpi+F4xB1ol3IS01i79+8yLyCXg=; b=EgsoJWsr+xR3FQZ6OkLBEQxe4NQemHy9vD+nk1CkIbCvgR/hCRD+nGpnPiKYhzgIb6 D7+1JImXKj5hDcO4LxE0Dq6j0PISDd9rN4nF++a/DZXHact1515rDcRVydYo6plSMmfI irFYchrO07FECn/AcfqHKVUtlx9JeSkgLXbe+BdEI+VyEOS2iGtN5nXDCYodafh8Mtqa WnODYiHic5qHDn9Tz9tANTlaf9q2goPwNd4LsP45zCebHphnT0WD2mSs1RWGkl2xizz3 +wJ1ze+nBWSGmAOMwvzhJ+CRyGEtIK/dqoLVj9KE/3dN7IttJXkaN1FbKmtPzQuMfSHR 5HnA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=XKmPd1WS; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id w187si866727ywf.481.2018.01.05.19.23.16 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:23:17 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=XKmPd1WS; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44077 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXf4m-0006Wr-Gn for patch@linaro.org; Fri, 05 Jan 2018 22:23:16 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48271) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXevu-0007TU-0n for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:08 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXevr-0004yf-9L for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:06 -0500 Received: from mail-pf0-x243.google.com ([2607:f8b0:400e:c00::243]:36022) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXevr-0004wm-1J for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:03 -0500 Received: by mail-pf0-x243.google.com with SMTP id 23so296798pfp.3 for ; Fri, 05 Jan 2018 19:14:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=hEPQ2EScbO0DclpqIpi+F4xB1ol3IS01i79+8yLyCXg=; b=XKmPd1WSjmWJ5jV6aCzEwLZMeqqw9RHx7da9Fks2NoceD9xZGhMEgCAwHVJP2wJRwc NjMkp5z4uha0doOuF6VRL5chm8AiFbBL1YmUxhOFJ4CjMA7Dgak6WuOhL5dAST785zKg Fa1eRL830ZRPzXUo92a28Vp5TwTL7OeSEWJTw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=hEPQ2EScbO0DclpqIpi+F4xB1ol3IS01i79+8yLyCXg=; b=FLsjNnpd0NJKeyq+3krmTmAC2PZN0mRgqSr7U/qa9sNOsk51DQnDIqNCFMLrtJAafa a89awgTkmBJyNSZGIqGuJ/p9n2cBXwHAKbcFMH9bvqCpDAmAz1YgVbMu8v+lLdi+I6mL H8FKULtp+56rO5N6e4qZfFYWxcezNSsDRaJGa0E3bAFnQqEIQs/M5WzERGevmalIK3S/ Bx+EHgnNwV8t0yRmDCMkWZFnfCZiJuCQznpEwjXGhG6llUpKaZPDiwbE7c7UghWCQi35 Hjgl4AEdLOsl61GqIXpTzkQVyiKNs75l2zBfz/fIoAkOGOHoMcDxuUJZEF3Pdyqk4kki US7Q== X-Gm-Message-State: AKGB3mIw22SSY1jqGJRDj7rjT0PIdog2CucdBLFzKzvg2JdRCzb/Ai+U +Z7w+alQR6zNf1Hpoq4Jh5qPwZQEwKI= X-Received: by 10.99.96.81 with SMTP id u78mr1534504pgb.427.1515208441575; Fri, 05 Jan 2018 19:14:01 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.14.00 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:14:00 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:30 -0800 Message-Id: <20180106031346.6650-8-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::243 Subject: [Qemu-devel] [PATCH v8 07/23] tcg: Add generic vector ops for comparisons X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- accel/tcg/tcg-runtime.h | 30 +++++++ tcg/tcg-op-gvec.h | 4 + tcg/tcg-op.h | 3 + tcg/tcg-opc.h | 2 + tcg/tcg.h | 1 + accel/tcg/tcg-runtime-gvec.c | 24 +++++ tcg/tcg-op-gvec.c | 202 +++++++++++++++++++++++++++++++++++++++++++ tcg/tcg-op-vec.c | 23 +++++ tcg/tcg.c | 2 + tcg/README | 4 + 10 files changed, 295 insertions(+) -- 2.14.3 diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h index cb05a755b8..28abf30d76 100644 --- a/accel/tcg/tcg-runtime.h +++ b/accel/tcg/tcg-runtime.h @@ -193,3 +193,33 @@ DEF_HELPER_FLAGS_4(gvec_trn8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_trn16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_trn32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_trn64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_eq8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_eq16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_eq32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_eq64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_ne8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ne16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ne32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ne64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_lt8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_lt16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_lt32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_lt64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_le8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_le16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_le32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_le64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_ltu8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ltu16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ltu32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ltu64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_leu8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_leu16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_leu32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_leu64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h index 7ca78853f2..c71aded066 100644 --- a/tcg/tcg-op-gvec.h +++ b/tcg/tcg-op-gvec.h @@ -220,6 +220,10 @@ void tcg_gen_gvec_trne(unsigned vece, uint32_t dofs, uint32_t aofs, void tcg_gen_gvec_trno(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_cmp(TCGCond cond, unsigned vece, uint32_t dofs, + uint32_t aofs, uint32_t bofs, + uint32_t oprsz, uint32_t maxsz); + /* * 64-bit vector operations. Use these when the register has been allocated * with tcg_global_mem_new_i64, and so we cannot also address it via pointer. diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index c002523bea..777db6ad59 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -939,6 +939,9 @@ void tcg_gen_uzpo_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_trne_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_trno_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_cmp_vec(TCGCond cond, unsigned vece, TCGv_vec r, + TCGv_vec a, TCGv_vec b); + void tcg_gen_ld_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset); void tcg_gen_st_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset); void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t); diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index a085fc077b..d3fa014507 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -248,6 +248,8 @@ DEF(uzpo_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_uzp_vec)) DEF(trne_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_trn_vec)) DEF(trno_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_trn_vec)) +DEF(cmp_vec, 1, 2, 1, IMPLVEC) + DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT) #if TCG_TARGET_MAYBE_vec diff --git a/tcg/tcg.h b/tcg/tcg.h index bdfe058c45..ceef8742b5 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -184,6 +184,7 @@ typedef uint64_t TCGRegSet; #define TCG_TARGET_HAS_zip_vec 0 #define TCG_TARGET_HAS_uzp_vec 0 #define TCG_TARGET_HAS_trn_vec 0 +#define TCG_TARGET_HAS_cmp_vec 0 #else #define TCG_TARGET_MAYBE_vec 1 #endif diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c index fba62f1192..e0cde3216f 100644 --- a/accel/tcg/tcg-runtime-gvec.c +++ b/accel/tcg/tcg-runtime-gvec.c @@ -520,3 +520,27 @@ DO_TRN(gvec_trn8, uint8_t) DO_TRN(gvec_trn16, uint16_t) DO_TRN(gvec_trn32, uint32_t) DO_TRN(gvec_trn64, uint64_t) + +#define DO_CMP1(NAME, TYPE, OP) \ +void HELPER(NAME)(void *d, void *a, void *b, uint32_t desc) \ +{ \ + intptr_t oprsz = simd_oprsz(desc); \ + intptr_t i; \ + for (i = 0; i < oprsz; i += sizeof(vec64)) { \ + *(TYPE *)(d + i) = *(TYPE *)(a + i) OP *(TYPE *)(b + i); \ + } \ + clear_high(d, oprsz, desc); \ +} + +#define DO_CMP2(SZ) \ + DO_CMP1(gvec_eq##SZ, vec##SZ, ==) \ + DO_CMP1(gvec_ne##SZ, vec##SZ, !=) \ + DO_CMP1(gvec_lt##SZ, svec##SZ, <) \ + DO_CMP1(gvec_le##SZ, svec##SZ, <=) \ + DO_CMP1(gvec_ltu##SZ, vec##SZ, <) \ + DO_CMP1(gvec_leu##SZ, vec##SZ, <=) + +DO_CMP2(8) +DO_CMP2(16) +DO_CMP2(32) +DO_CMP2(64) diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index 0ba61dd049..664dcae041 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -2020,3 +2020,205 @@ void tcg_gen_gvec_trno(unsigned vece, uint32_t dofs, uint32_t aofs, tcg_debug_assert(vece <= MO_64); tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]); } + +/* Expand OPSZ bytes worth of three-operand operations using i32 elements. */ +static void expand_cmp_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, TCGCond cond) +{ + TCGv_i32 t0 = tcg_temp_new_i32(); + TCGv_i32 t1 = tcg_temp_new_i32(); + uint32_t i; + + for (i = 0; i < opsz; i += 4) { + tcg_gen_ld_i32(t0, cpu_env, aofs + i); + tcg_gen_ld_i32(t1, cpu_env, bofs + i); + tcg_gen_setcond_i32(cond, t0, t0, t1); + tcg_gen_neg_i32(t0, t0); + tcg_gen_st_i32(t0, cpu_env, dofs + i); + } + tcg_temp_free_i32(t1); + tcg_temp_free_i32(t0); +} + +static void expand_cmp_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs, + uint32_t opsz, TCGCond cond) +{ + TCGv_i64 t0 = tcg_temp_new_i64(); + TCGv_i64 t1 = tcg_temp_new_i64(); + uint32_t i; + + for (i = 0; i < opsz; i += 8) { + tcg_gen_ld_i64(t0, cpu_env, aofs + i); + tcg_gen_ld_i64(t1, cpu_env, bofs + i); + tcg_gen_setcond_i64(cond, t0, t0, t1); + tcg_gen_neg_i64(t0, t0); + tcg_gen_st_i64(t0, cpu_env, dofs + i); + } + tcg_temp_free_i64(t1); + tcg_temp_free_i64(t0); +} + +static void expand_cmp_vec(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t tysz, + TCGType type, TCGCond cond) +{ + TCGv_vec t0 = tcg_temp_new_vec(type); + TCGv_vec t1 = tcg_temp_new_vec(type); + uint32_t i; + + for (i = 0; i < opsz; i += tysz) { + tcg_gen_ld_vec(t0, cpu_env, aofs + i); + tcg_gen_ld_vec(t1, cpu_env, bofs + i); + tcg_gen_cmp_vec(cond, vece, t0, t0, t1); + tcg_gen_st_vec(t0, cpu_env, dofs + i); + } + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t0); +} + +void tcg_gen_gvec_cmp(TCGCond cond, unsigned vece, uint32_t dofs, + uint32_t aofs, uint32_t bofs, + uint32_t oprsz, uint32_t maxsz) +{ + static gen_helper_gvec_3 * const eq_fn[4] = { + gen_helper_gvec_eq8, gen_helper_gvec_eq16, + gen_helper_gvec_eq32, gen_helper_gvec_eq64 + }; + static gen_helper_gvec_3 * const ne_fn[4] = { + gen_helper_gvec_ne8, gen_helper_gvec_ne16, + gen_helper_gvec_ne32, gen_helper_gvec_ne64 + }; + static gen_helper_gvec_3 * const lt_fn[4] = { + gen_helper_gvec_lt8, gen_helper_gvec_lt16, + gen_helper_gvec_lt32, gen_helper_gvec_lt64 + }; + static gen_helper_gvec_3 * const le_fn[4] = { + gen_helper_gvec_le8, gen_helper_gvec_le16, + gen_helper_gvec_le32, gen_helper_gvec_le64 + }; + static gen_helper_gvec_3 * const ltu_fn[4] = { + gen_helper_gvec_ltu8, gen_helper_gvec_ltu16, + gen_helper_gvec_ltu32, gen_helper_gvec_ltu64 + }; + static gen_helper_gvec_3 * const leu_fn[4] = { + gen_helper_gvec_leu8, gen_helper_gvec_leu16, + gen_helper_gvec_leu32, gen_helper_gvec_leu64 + }; + gen_helper_gvec_3 *fn; + uint32_t tmp; + + check_size_align(oprsz, maxsz, dofs | aofs | bofs); + check_overlap_3(dofs, aofs, bofs, maxsz); + + if (cond == TCG_COND_NEVER || cond == TCG_COND_ALWAYS) { + tcg_gen_gvec_dup32i(dofs, oprsz, maxsz, -(cond == TCG_COND_ALWAYS)); + return; + } + + /* Quick check for sizes we won't support inline. */ + if (oprsz > MAX_UNROLL * 32 || maxsz > MAX_UNROLL * 32) { + goto do_ool; + } + + /* Recall that ARM SVE allows vector sizes that are not a power of 2. + Expand with successively smaller host vector sizes. The intent is + that e.g. oprsz == 80 would be expanded with 2x32 + 1x16. */ + /* ??? For maxsz > oprsz, the host may be able to use an op-sized + operation, zeroing the balance of the register. We can then + use a cl-sized store to implement the clearing without an extra + store operation. This is true for aarch64 and x86_64 hosts. */ + + if (TCG_TARGET_HAS_v256 && check_size_impl(oprsz, 32) + && tcg_can_emit_vec_op(INDEX_op_cmp_vec, TCG_TYPE_V256, vece)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 32); + expand_cmp_vec(vece, dofs, aofs, bofs, done, 32, TCG_TYPE_V256, cond); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (TCG_TARGET_HAS_v128 && check_size_impl(oprsz, 16) + && tcg_can_emit_vec_op(INDEX_op_cmp_vec, TCG_TYPE_V128, vece)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 16); + expand_cmp_vec(vece, dofs, aofs, bofs, done, 16, TCG_TYPE_V128, cond); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (check_size_impl(oprsz, 8)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 8); + if (TCG_TARGET_HAS_v64 + && (TCG_TARGET_REG_BITS == 32 || vece != MO_64) + && tcg_can_emit_vec_op(INDEX_op_cmp_vec, TCG_TYPE_V64, vece)) { + expand_cmp_vec(vece, dofs, aofs, bofs, done, 8, TCG_TYPE_V64, cond); + } else if (vece == MO_64) { + expand_cmp_i64(dofs, aofs, bofs, done, cond); + } else { + done = 0; + } + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (vece == MO_32 && check_size_impl(oprsz, 4)) { + uint32_t done = QEMU_ALIGN_DOWN(oprsz, 4); + expand_cmp_i32(dofs, aofs, bofs, done, cond); + dofs += done; + aofs += done; + bofs += done; + oprsz -= done; + maxsz -= done; + } + + if (oprsz == 0) { + if (maxsz != 0) { + expand_clr(dofs, maxsz); + } + return; + } + + do_ool: + switch (cond) { + case TCG_COND_EQ: + fn = eq_fn[vece]; + break; + case TCG_COND_NE: + fn = ne_fn[vece]; + break; + case TCG_COND_GT: + tmp = aofs, aofs = bofs, bofs = tmp; + /* fallthru */ + case TCG_COND_LT: + fn = lt_fn[vece]; + break; + case TCG_COND_GE: + tmp = aofs, aofs = bofs, bofs = tmp; + /* fallthru */ + case TCG_COND_LE: + fn = le_fn[vece]; + break; + case TCG_COND_GTU: + tmp = aofs, aofs = bofs, bofs = tmp; + /* fallthru */ + case TCG_COND_LTU: + fn = ltu_fn[vece]; + break; + case TCG_COND_GEU: + tmp = aofs, aofs = bofs, bofs = tmp; + /* fallthru */ + case TCG_COND_LEU: + fn = leu_fn[vece]; + break; + default: + g_assert_not_reached(); + } + tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, 0, fn); +} diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c index b54d3de19f..b3f0ec324a 100644 --- a/tcg/tcg-op-vec.c +++ b/tcg/tcg-op-vec.c @@ -480,3 +480,26 @@ void tcg_gen_trno_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) { do_interleave(INDEX_op_trno_vec, vece, r, a, b); } + +void tcg_gen_cmp_vec(TCGCond cond, unsigned vece, + TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + TCGTemp *rt = tcgv_vec_temp(r); + TCGTemp *at = tcgv_vec_temp(a); + TCGTemp *bt = tcgv_vec_temp(b); + TCGArg ri = temp_arg(rt); + TCGArg ai = temp_arg(at); + TCGArg bi = temp_arg(bt); + TCGType type = rt->base_type; + int can; + + tcg_debug_assert(at->base_type == type); + tcg_debug_assert(bt->base_type == type); + can = tcg_can_emit_vec_op(INDEX_op_cmp_vec, type, vece); + if (can > 0) { + vec_gen_4(INDEX_op_cmp_vec, type, vece, ri, ai, bi, cond); + } else { + tcg_debug_assert(can < 0); + tcg_expand_vec_op(INDEX_op_cmp_vec, type, vece, ri, ai, bi, cond); + } +} diff --git a/tcg/tcg.c b/tcg/tcg.c index f82d6e80b0..a85547a6d2 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1392,6 +1392,7 @@ bool tcg_op_supported(TCGOpcode op) case INDEX_op_and_vec: case INDEX_op_or_vec: case INDEX_op_xor_vec: + case INDEX_op_cmp_vec: return have_vec; case INDEX_op_dup2_vec: return have_vec && TCG_TARGET_REG_BITS == 32; @@ -1791,6 +1792,7 @@ void tcg_dump_ops(TCGContext *s) case INDEX_op_brcond_i64: case INDEX_op_setcond_i64: case INDEX_op_movcond_i64: + case INDEX_op_cmp_vec: if (op->args[k] < ARRAY_SIZE(cond_name) && cond_name[op->args[k]]) { col += qemu_log(",%s", cond_name[op->args[k++]]); diff --git a/tcg/README b/tcg/README index 75db47922d..18b6bbd8f1 100644 --- a/tcg/README +++ b/tcg/README @@ -630,6 +630,10 @@ E.g. VECL=1 -> 64 << 1 -> v128, and VECE=2 -> 1 << 2 -> i32. v0[2i + 1] = v2[2i + part]; } +* cmp_vec v0, v1, v2, cond + + Compare vectors by element, storing -1 for true and 0 for false. + ********* Note 1: Some shortcuts are defined when the last operand is known to be From patchwork Sat Jan 6 03:13:31 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123608 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp125642qgn; Fri, 5 Jan 2018 19:20:36 -0800 (PST) X-Google-Smtp-Source: ACJfBos5GwiElrypox7MVblzpOg0VWN/QjJAy+HfNF0gKqwOYxcbCEkdcSASwNUjDrdft2Ig6mbD X-Received: by 10.37.76.131 with SMTP id z125mr4706538yba.298.1515208836104; Fri, 05 Jan 2018 19:20:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515208836; cv=none; d=google.com; s=arc-20160816; b=TJQacqj0QiImgc8OwAeexiJ26YjKJ5ocwq2zBU/s47dW1ngwjXe6zuOA9oHCTG2aE3 uNmsoOPE8rTV1TyUBGeU/AOEwdN48Zv1JaZppIO/gihkQ0Z+mz7xZ20v8jS1wZJeO/dJ evEyqpBAUYQzcX48WxZAniql+L8JCVg9PnMBm/h3a6TscPhYqPLkDk0eE3kq5k6VmZ5r hRT4shmhvtyj48Szw6Q+CcH11J0qnI0X68kOSzaVAfFyh1Xg/JRRVLCyG4Bt+UwveeKI gD1YXh3o4tVxswIqibzdSykNokiyGm5IKHmPvKBlYzLRncCQsp3uzJJDc75XlgqO6cLs QB6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=z+SN4+gEyLJC1vsqYBGe2sCFGxtyymMgohuvdwfs/+E=; b=Q9bzbjrujGRISnEeyFRSgoFIjonYMpVaFoWSfS3+j++y52MHJxPBUHplfLynCaGWMW g7yO2MMVs+U8K83adLikFAhz8PZT5KbUeF+HNDdaFnDuPiqL1iWKpaj+5HsfaWxbhDyg t/arQ5pNUVYKQhovWVS+38uXYa+Njl5p8ZFPaAt+KK8uhLf7ka3vz0zMDkq2yEqkZ52X 3RxPcX9q33FQIRYKoAszvZ8jlmBcmGvylYHfirocuuNWDOhc1wwRhe9IpX+CwdgPUyEI Xpvv1ft/nNfB2S/EjOC9U1cys7D1VCrjWQtvmtrRZTdh1IESzu1M3meAgiW7aGpMj/8H niVQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=iKROvSVp; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id c9si1547583ybg.190.2018.01.05.19.20.36 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:20:36 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=iKROvSVp; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44061 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXf2B-0003za-Hy for patch@linaro.org; Fri, 05 Jan 2018 22:20:35 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48277) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXevu-0007Tp-BQ for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXevs-00051q-ON for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:06 -0500 Received: from mail-pl0-x242.google.com ([2607:f8b0:400e:c01::242]:32858) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXevs-00050G-F7 for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:04 -0500 Received: by mail-pl0-x242.google.com with SMTP id 1so4240867plv.0 for ; Fri, 05 Jan 2018 19:14:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=z+SN4+gEyLJC1vsqYBGe2sCFGxtyymMgohuvdwfs/+E=; b=iKROvSVprMZ2WwGlzlVYxyFL1tCAEVKG2csRPwMgu3hy9QMWRCgwGcbzYC98V6E5Nk aSDdHJnVKgKhyNTsYDimoc9jXJ3NWD5Xg91LqYcyOhb1sp3KvHdqi9NIjw1bwlSJo5fF Bd4UXtHmhtnssjeWUlSi2RR4n3xf2NcCHn3Wk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=z+SN4+gEyLJC1vsqYBGe2sCFGxtyymMgohuvdwfs/+E=; b=EK3iayfII0+MgXb65c+97VBv29+JekGZk8DIyE1hsyt/OAvwbjK+GPElSAJ5H5xwrJ C+YoPYm4jKuVbkzYlRBCcCvOLNbsm+88j3TJF+6Ch0JYsOWkxIqiCMrOGZc6pqYt7FoT f1lfx7pTQayrgU1rM88tFgvsLMwvR8zArep8tkp/altpWs3cTvB6QXYFVoA7cZVBZSNs OfFYZAoH6VP7nbyQMHwfB1xQ8EQhoMizwiFLVmfAz6vESGhgTJcH5u0ld+NPJ+P7ssqO KLhJj/2Mh1niKroZQfD8AKs0PSXOuFPtAN/cK0Bwc/kGuAXZGuQBxCQC+saL/6Ipe41c EAlA== X-Gm-Message-State: AKGB3mIY18X1WhBlZpS08rbJOyMOFIDE0cGhl5ywVwZuEBOa4Kdyyaf3 oRXSf68//+5ha2lz4E6p8mHs/B6LFAE= X-Received: by 10.159.241.152 with SMTP id s24mr5229492plr.229.1515208443165; Fri, 05 Jan 2018 19:14:03 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.14.01 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:14:02 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:31 -0800 Message-Id: <20180106031346.6650-9-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::242 Subject: [Qemu-devel] [PATCH v8 08/23] tcg: Add generic vector ops for multiplication X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- accel/tcg/tcg-runtime.h | 5 +++++ tcg/tcg-op-gvec.h | 2 ++ tcg/tcg-op.h | 1 + tcg/tcg-opc.h | 1 + tcg/tcg.h | 1 + accel/tcg/tcg-runtime-gvec.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ tcg/tcg-op-gvec.c | 29 +++++++++++++++++++++++++++++ tcg/tcg-op-vec.c | 22 ++++++++++++++++++++++ tcg/tcg.c | 2 ++ tcg/README | 4 ++++ 10 files changed, 111 insertions(+) -- 2.14.3 diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h index 28abf30d76..c4a2e6b215 100644 --- a/accel/tcg/tcg-runtime.h +++ b/accel/tcg/tcg-runtime.h @@ -152,6 +152,11 @@ DEF_HELPER_FLAGS_4(gvec_sub16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_sub32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_sub64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_mul8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_mul16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_mul32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_mul64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_3(gvec_neg8, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(gvec_neg16, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(gvec_neg32, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h index c71aded066..41839a61a6 100644 --- a/tcg/tcg-op-gvec.h +++ b/tcg/tcg-op-gvec.h @@ -176,6 +176,8 @@ void tcg_gen_gvec_add(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz); void tcg_gen_gvec_sub(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_mul(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t oprsz, uint32_t maxsz); void tcg_gen_gvec_and(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz); diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index 777db6ad59..f967790cd9 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -920,6 +920,7 @@ void tcg_gen_movi_v128(TCGv_vec, uint64_t, uint64_t); void tcg_gen_movi_v256(TCGv_vec, uint64_t, uint64_t, uint64_t, uint64_t); void tcg_gen_add_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_sub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_mul_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_and_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_or_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_xor_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index d3fa014507..b21a30273c 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -220,6 +220,7 @@ DEF(st_vec, 0, 2, 1, IMPLVEC) DEF(add_vec, 1, 2, 0, IMPLVEC) DEF(sub_vec, 1, 2, 0, IMPLVEC) +DEF(mul_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_mul_vec)) DEF(neg_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_neg_vec)) DEF(and_vec, 1, 2, 0, IMPLVEC) diff --git a/tcg/tcg.h b/tcg/tcg.h index ceef8742b5..64078a2e92 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -185,6 +185,7 @@ typedef uint64_t TCGRegSet; #define TCG_TARGET_HAS_uzp_vec 0 #define TCG_TARGET_HAS_trn_vec 0 #define TCG_TARGET_HAS_cmp_vec 0 +#define TCG_TARGET_HAS_mul_vec 0 #else #define TCG_TARGET_MAYBE_vec 1 #endif diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c index e0cde3216f..9406ccd769 100644 --- a/accel/tcg/tcg-runtime-gvec.c +++ b/accel/tcg/tcg-runtime-gvec.c @@ -141,6 +141,50 @@ void HELPER(gvec_sub64)(void *d, void *a, void *b, uint32_t desc) clear_high(d, oprsz, desc); } +void HELPER(gvec_mul8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(vec8 *)(d + i) = *(vec8 *)(a + i) * *(vec8 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_mul16)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(vec16 *)(d + i) = *(vec16 *)(a + i) * *(vec16 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_mul32)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(vec32 *)(d + i) = *(vec32 *)(a + i) * *(vec32 *)(b + i); + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_mul64)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) * *(vec64 *)(b + i); + } + clear_high(d, oprsz, desc); +} + void HELPER(gvec_neg8)(void *d, void *a, uint32_t desc) { intptr_t oprsz = simd_oprsz(desc); diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index 664dcae041..5e970cb2aa 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -1404,6 +1404,35 @@ void tcg_gen_gvec_sub(unsigned vece, uint32_t dofs, uint32_t aofs, tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]); } +void tcg_gen_gvec_mul(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen3 g[4] = { + { .fniv = tcg_gen_mul_vec, + .fno = gen_helper_gvec_mul8, + .opc = INDEX_op_mul_vec, + .vece = MO_8 }, + { .fniv = tcg_gen_mul_vec, + .fno = gen_helper_gvec_mul16, + .opc = INDEX_op_mul_vec, + .vece = MO_16 }, + { .fni4 = tcg_gen_mul_i32, + .fniv = tcg_gen_mul_vec, + .fno = gen_helper_gvec_mul32, + .opc = INDEX_op_mul_vec, + .vece = MO_32 }, + { .fni8 = tcg_gen_mul_i64, + .fniv = tcg_gen_mul_vec, + .fno = gen_helper_gvec_mul64, + .opc = INDEX_op_mul_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g[vece]); +} + /* Perform a vector negation using normal negation and a mask. Compare gen_subv_mask above. */ static void gen_negv_mask(TCGv_i64 d, TCGv_i64 b, TCGv_i64 m) diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c index b3f0ec324a..9038cc6c84 100644 --- a/tcg/tcg-op-vec.c +++ b/tcg/tcg-op-vec.c @@ -503,3 +503,25 @@ void tcg_gen_cmp_vec(TCGCond cond, unsigned vece, tcg_expand_vec_op(INDEX_op_cmp_vec, type, vece, ri, ai, bi, cond); } } + +void tcg_gen_mul_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) +{ + TCGTemp *rt = tcgv_vec_temp(r); + TCGTemp *at = tcgv_vec_temp(a); + TCGTemp *bt = tcgv_vec_temp(b); + TCGArg ri = temp_arg(rt); + TCGArg ai = temp_arg(at); + TCGArg bi = temp_arg(bt); + TCGType type = rt->base_type; + int can; + + tcg_debug_assert(at->base_type == type); + tcg_debug_assert(bt->base_type == type); + can = tcg_can_emit_vec_op(INDEX_op_mul_vec, type, vece); + if (can > 0) { + vec_gen_3(INDEX_op_mul_vec, type, vece, ri, ai, bi); + } else { + tcg_debug_assert(can < 0); + tcg_expand_vec_op(INDEX_op_mul_vec, type, vece, ri, ai, bi); + } +} diff --git a/tcg/tcg.c b/tcg/tcg.c index a85547a6d2..5608391dca 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1404,6 +1404,8 @@ bool tcg_op_supported(TCGOpcode op) return have_vec && TCG_TARGET_HAS_andc_vec; case INDEX_op_orc_vec: return have_vec && TCG_TARGET_HAS_orc_vec; + case INDEX_op_mul_vec: + return have_vec && TCG_TARGET_HAS_mul_vec; case INDEX_op_shli_vec: case INDEX_op_shri_vec: case INDEX_op_sari_vec: diff --git a/tcg/README b/tcg/README index 18b6bbd8f1..17695ff7f6 100644 --- a/tcg/README +++ b/tcg/README @@ -547,6 +547,10 @@ E.g. VECL=1 -> 64 << 1 -> v128, and VECE=2 -> 1 << 2 -> i32. Similarly, v0 = v1 - v2. +* mul_vec v0, v1, v2 + + Similarly, v0 = v1 * v2. + * neg_vec v0, v1 Similarly, v0 = -v1. From patchwork Sat Jan 6 03:13:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123602 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp122435qgn; Fri, 5 Jan 2018 19:15:46 -0800 (PST) X-Google-Smtp-Source: ACJfBos69JtLMnL71IPW1LhUtOlp5ifrysSKc4UAoUHHcCHs9vRxrzd7998qDM4Jh+eR0xRc1HAd X-Received: by 10.129.169.136 with SMTP id g130mr4596998ywh.250.1515208546492; Fri, 05 Jan 2018 19:15:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515208546; cv=none; d=google.com; s=arc-20160816; b=LTxaEPBqjAKn0M9POFtVvFx+hKV9a4lpSB9QJ3Px2yOoShbtY4IC1gc5cNHg4X9rbr obZQ4LH0AkLLgvemmYLBpDDFNfF0DFc0SrraWfh2Gj7xtsCz6B1MdkytoYMRWoDk9gWr DOxkT74zIDwoizoR10bnT3haErJoo0cZNDVoFvIwf7A0LBSdvbAfhCaXGx7ocpLRQMGA Sx1Z6Gi8s2W3G3fxH2GCBgCydca2+73DdZXnAMfLgl2IJ58HxPLQf/lD74DXHzYLIFj0 hUxatjEq40hIj5Avz2Q+b/xMih4sAMXfeaPXrxZtWvb+Edu3IRPj6165mNLXmsfdt/sp 91Qw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=Tv23mgBT0GiWZsGmF4XllvFOvYugbvV9Q1wwe5wEyJE=; b=yQ+s0bzSXtYTFLGSgLVHlrmMBULPevX67/9aonNY6Gu7DioRUuxzdpOcwl7Qu/tppI p0Wx+u1ncMXU+MEJV9C9Y7XJdj+MSg08B95rmdU14NdDBrVKqwGy68GLuFC7d8puHnNk OnRFhyQWWD9e9m+u75jF9dC/vMKadvgyV51aDn8GrT9swi/+MlYZA4Qwp+TirQjv/gAb vFdahukdVAyRl1J2vHA97GyjEcamwsgak0vqYfVeILmdNJtZuHX1IzMhgM2OAZmMQiwg EkAbj8LEOcZHLkM8knEYmhLY1yWzDJu1hdku86pBShauv6M/IcOcIn3jViTamjlfyxpr CRCw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=d3lILqC4; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id x186si1464825ywf.561.2018.01.05.19.15.46 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:15:46 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=d3lILqC4; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44034 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXexV-000087-Qm for patch@linaro.org; Fri, 05 Jan 2018 22:15:45 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48324) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXevx-0007WH-4k for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:11 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXevu-000561-EH for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:09 -0500 Received: from mail-pl0-x244.google.com ([2607:f8b0:400e:c01::244]:46285) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXevu-00053t-55 for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:06 -0500 Received: by mail-pl0-x244.google.com with SMTP id i6so4222971plt.13 for ; Fri, 05 Jan 2018 19:14:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=Tv23mgBT0GiWZsGmF4XllvFOvYugbvV9Q1wwe5wEyJE=; b=d3lILqC4H8FJZ+pHMIrE15zGgxZG2psva8xY4N9pvTXCCLNOE8b4rd1SLFlri/JqPf FXMuINMSPQBasBWznUzfz93sN33prEiRnTtbnYTNpIlwhZ3jdDHO/VrSSAk6V+XUe0rs KRbdOW00QEvM2qCqLJe2FERU7P2n/dSdoMdcc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=Tv23mgBT0GiWZsGmF4XllvFOvYugbvV9Q1wwe5wEyJE=; b=iW+9RbNzrkKV/56gkojJe6HIMO0CyLSagTSOCXfFY2ywZPSKrWgdhSVTVjHAlPZptZ 8wItSvP5BSRLSZPOYrWfjbzU7Xm+XnnKxbhfwdpNd6y2CElZ9wmM8O87LfZPnTUPj5QT w2W282KbqtkKWJ3Sd6ZVRyO3w93lREGoMF8iV7jXAm6uCpmgCKzx8x8IvU6bZdXGASo+ 64QWPfKFe8mXdOBD8PfYeTowZwwfnFnFNqC4ISSAm6PaYdqjDxX9eoLlBIfhg1ed4cUo Z+LglH5jQjyEHqFbjPCSXikjw0BlmoPbNLOT/hfWJ/TMfGGdWq7wWlDAC8eoLKx+CO0P XLjQ== X-Gm-Message-State: AKGB3mJfXo0fME/xT8oWbuALo+fck9/xy6DdQdtWIhtNvIlQpjNhMDrt E0IdS30jF49QIDnEVUm5UgZW6x7mE3w= X-Received: by 10.84.224.133 with SMTP id s5mr367294plj.438.1515208444737; Fri, 05 Jan 2018 19:14:04 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.14.03 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:14:03 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:32 -0800 Message-Id: <20180106031346.6650-10-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::244 Subject: [Qemu-devel] [PATCH v8 09/23] tcg: Add generic vector ops for extension X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- accel/tcg/tcg-runtime.h | 8 +++ tcg/tcg-op-gvec.h | 9 +++ tcg/tcg-op.h | 5 ++ tcg/tcg-opc.h | 5 ++ tcg/tcg.h | 2 + accel/tcg/tcg-runtime-gvec.c | 26 ++++++++ tcg/tcg-op-gvec.c | 154 ++++++++++++++++++++++++++++++++++++++++--- tcg/tcg-op-vec.c | 39 +++++++++++ tcg/tcg.c | 6 ++ tcg/README | 13 ++++ 10 files changed, 259 insertions(+), 8 deletions(-) -- 2.14.3 diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h index c4a2e6b215..d1b3542946 100644 --- a/accel/tcg/tcg-runtime.h +++ b/accel/tcg/tcg-runtime.h @@ -199,6 +199,14 @@ DEF_HELPER_FLAGS_4(gvec_trn16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_trn32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_trn64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_extu8, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_extu16, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_extu32, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(gvec_exts8, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_exts16, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_exts32, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(gvec_eq8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_eq16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_eq32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h index 41839a61a6..91daf9e6ef 100644 --- a/tcg/tcg-op-gvec.h +++ b/tcg/tcg-op-gvec.h @@ -222,6 +222,15 @@ void tcg_gen_gvec_trne(unsigned vece, uint32_t dofs, uint32_t aofs, void tcg_gen_gvec_trno(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_extul(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_extuh(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_extsl(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_extsh(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz); + void tcg_gen_gvec_cmp(TCGCond cond, unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz); diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index f967790cd9..28a5cbe47a 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -940,6 +940,11 @@ void tcg_gen_uzpo_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_trne_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); void tcg_gen_trno_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); +void tcg_gen_extul_vec(unsigned vece, TCGv_vec r, TCGv_vec a); +void tcg_gen_extuh_vec(unsigned vece, TCGv_vec r, TCGv_vec a); +void tcg_gen_extsl_vec(unsigned vece, TCGv_vec r, TCGv_vec a); +void tcg_gen_extsh_vec(unsigned vece, TCGv_vec r, TCGv_vec a); + void tcg_gen_cmp_vec(TCGCond cond, unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b); diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h index b21a30273c..3dfd872a0f 100644 --- a/tcg/tcg-opc.h +++ b/tcg/tcg-opc.h @@ -249,6 +249,11 @@ DEF(uzpo_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_uzp_vec)) DEF(trne_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_trn_vec)) DEF(trno_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_trn_vec)) +DEF(extul_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_extl_vec)) +DEF(extuh_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_exth_vec)) +DEF(extsl_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_extl_vec)) +DEF(extsh_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_exth_vec)) + DEF(cmp_vec, 1, 2, 1, IMPLVEC) DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT) diff --git a/tcg/tcg.h b/tcg/tcg.h index 64078a2e92..737550385d 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -186,6 +186,8 @@ typedef uint64_t TCGRegSet; #define TCG_TARGET_HAS_trn_vec 0 #define TCG_TARGET_HAS_cmp_vec 0 #define TCG_TARGET_HAS_mul_vec 0 +#define TCG_TARGET_HAS_extl_vec 0 +#define TCG_TARGET_HAS_exth_vec 0 #else #define TCG_TARGET_MAYBE_vec 1 #endif diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c index 9406ccd769..ff26be0744 100644 --- a/accel/tcg/tcg-runtime-gvec.c +++ b/accel/tcg/tcg-runtime-gvec.c @@ -588,3 +588,29 @@ DO_CMP2(8) DO_CMP2(16) DO_CMP2(32) DO_CMP2(64) + +#define DO_EXT(NAME, TYPE1, TYPE2) \ +void HELPER(NAME)(void *d, void *a, uint32_t desc) \ +{ \ + intptr_t oprsz = simd_oprsz(desc); \ + intptr_t oprsz_2 = oprsz / 2; \ + intptr_t i; \ + /* We produce output faster than we consume input. \ + Therefore we must be mindful of possible overlap. */ \ + if (unlikely((a - d) < (uintptr_t)oprsz)) { \ + void *a_new = alloca(oprsz_2); \ + memcpy(a_new, a, oprsz_2); \ + a = a_new; \ + } \ + for (i = 0; i < oprsz_2; i += sizeof(TYPE1)) { \ + *(TYPE2 *)(d + 2 * i) = *(TYPE1 *)(a + i); \ + } \ + clear_high(d, oprsz, desc); \ +} + +DO_EXT(gvec_extu8, uint8_t, uint16_t) +DO_EXT(gvec_extu16, uint16_t, uint32_t) +DO_EXT(gvec_extu32, uint32_t, uint64_t) +DO_EXT(gvec_exts8, int8_t, int16_t) +DO_EXT(gvec_exts16, int16_t, int32_t) +DO_EXT(gvec_exts32, int32_t, int64_t) diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index 5e970cb2aa..ec273de3dc 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -1405,7 +1405,7 @@ void tcg_gen_gvec_sub(unsigned vece, uint32_t dofs, uint32_t aofs, } void tcg_gen_gvec_mul(unsigned vece, uint32_t dofs, uint32_t aofs, - uint32_t bofs, uint32_t opsz, uint32_t maxsz) + uint32_t bofs, uint32_t oprsz, uint32_t maxsz) { static const GVecGen3 g[4] = { { .fniv = tcg_gen_mul_vec, @@ -1430,7 +1430,7 @@ void tcg_gen_gvec_mul(unsigned vece, uint32_t dofs, uint32_t aofs, }; tcg_debug_assert(vece <= MO_64); - tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g[vece]); + tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]); } /* Perform a vector negation using normal negation and a mask. @@ -2052,13 +2052,13 @@ void tcg_gen_gvec_trno(unsigned vece, uint32_t dofs, uint32_t aofs, /* Expand OPSZ bytes worth of three-operand operations using i32 elements. */ static void expand_cmp_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs, - uint32_t opsz, TCGCond cond) + uint32_t oprsz, TCGCond cond) { TCGv_i32 t0 = tcg_temp_new_i32(); TCGv_i32 t1 = tcg_temp_new_i32(); uint32_t i; - for (i = 0; i < opsz; i += 4) { + for (i = 0; i < oprsz; i += 4) { tcg_gen_ld_i32(t0, cpu_env, aofs + i); tcg_gen_ld_i32(t1, cpu_env, bofs + i); tcg_gen_setcond_i32(cond, t0, t0, t1); @@ -2070,13 +2070,13 @@ static void expand_cmp_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs, } static void expand_cmp_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs, - uint32_t opsz, TCGCond cond) + uint32_t oprsz, TCGCond cond) { TCGv_i64 t0 = tcg_temp_new_i64(); TCGv_i64 t1 = tcg_temp_new_i64(); uint32_t i; - for (i = 0; i < opsz; i += 8) { + for (i = 0; i < oprsz; i += 8) { tcg_gen_ld_i64(t0, cpu_env, aofs + i); tcg_gen_ld_i64(t1, cpu_env, bofs + i); tcg_gen_setcond_i64(cond, t0, t0, t1); @@ -2088,14 +2088,14 @@ static void expand_cmp_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs, } static void expand_cmp_vec(unsigned vece, uint32_t dofs, uint32_t aofs, - uint32_t bofs, uint32_t opsz, uint32_t tysz, + uint32_t bofs, uint32_t oprsz, uint32_t tysz, TCGType type, TCGCond cond) { TCGv_vec t0 = tcg_temp_new_vec(type); TCGv_vec t1 = tcg_temp_new_vec(type); uint32_t i; - for (i = 0; i < opsz; i += tysz) { + for (i = 0; i < oprsz; i += tysz) { tcg_gen_ld_vec(t0, cpu_env, aofs + i); tcg_gen_ld_vec(t1, cpu_env, bofs + i); tcg_gen_cmp_vec(cond, vece, t0, t0, t1); @@ -2251,3 +2251,141 @@ void tcg_gen_gvec_cmp(TCGCond cond, unsigned vece, uint32_t dofs, } tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, 0, fn); } + +static void do_ext(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz, bool high, bool is_sign) +{ + static gen_helper_gvec_2 * const extu_fn[3] = { + gen_helper_gvec_extu8, gen_helper_gvec_extu16, gen_helper_gvec_extu32 + }; + static gen_helper_gvec_2 * const exts_fn[3] = { + gen_helper_gvec_exts8, gen_helper_gvec_exts16, gen_helper_gvec_exts32 + }; + + TCGType type; + uint32_t step, i, n; + TCGOpcode opc; + + check_size_align(oprsz, maxsz, dofs | aofs); + check_overlap_2(dofs, aofs, oprsz); + tcg_debug_assert(vece < MO_64); + + /* Quick check for sizes we won't support inline. */ + if (oprsz > 4 * 32 || maxsz > MAX_UNROLL * 32) { + goto do_ool; + } + + opc = is_sign ? (high ? INDEX_op_extsh_vec : INDEX_op_extsl_vec) + : (high ? INDEX_op_extuh_vec : INDEX_op_extul_vec); + + /* Since these operations don't operate in lock-step lanes, + we must care for overlap. */ + if (TCG_TARGET_HAS_v256 && oprsz % 32 == 0 && oprsz / 32 <= 8 + && tcg_can_emit_vec_op(opc, TCG_TYPE_V256, vece)) { + type = TCG_TYPE_V256; + step = 32; + n = oprsz / 32; + } else if (TCG_TARGET_HAS_v128 && oprsz % 16 == 0 && oprsz / 16 <= 8 + && tcg_can_emit_vec_op(opc, TCG_TYPE_V128, vece)) { + type = TCG_TYPE_V128; + step = 16; + n = oprsz / 16; + } else if (TCG_TARGET_HAS_v64 && oprsz % 8 == 0 && oprsz / 8 <= 8 + && tcg_can_emit_vec_op(opc, TCG_TYPE_V64, vece)) { + type = TCG_TYPE_V64; + step = 8; + n = oprsz / 8; + } else { + goto do_ool; + } + + if (n == 1) { + TCGv_vec t1 = tcg_temp_new_vec(type); + + tcg_gen_ld_vec(t1, cpu_env, aofs); + if (high) { + if (is_sign) { + tcg_gen_extsh_vec(vece, t1, t1); + } else { + tcg_gen_extuh_vec(vece, t1, t1); + } + } else { + if (is_sign) { + tcg_gen_extsl_vec(vece, t1, t1); + } else { + tcg_gen_extul_vec(vece, t1, t1); + } + } + tcg_gen_st_vec(t1, cpu_env, dofs); + tcg_temp_free_vec(t1); + } else { + TCGv_vec ta[4], tmp; + + if (high) { + aofs += oprsz / 2; + } + + for (i = 0; i < (n / 2 + n % 2); ++i) { + ta[i] = tcg_temp_new_vec(type); + tcg_gen_ld_vec(ta[i], cpu_env, aofs + i * step); + } + + tmp = tcg_temp_new_vec(type); + for (i = 0; i < n; ++i) { + if (i & 1) { + if (is_sign) { + tcg_gen_extsh_vec(vece, tmp, ta[i / 2]); + } else { + tcg_gen_extuh_vec(vece, tmp, ta[i / 2]); + } + } else { + if (is_sign) { + tcg_gen_extsl_vec(vece, tmp, ta[i / 2]); + } else { + tcg_gen_extul_vec(vece, tmp, ta[i / 2]); + } + } + tcg_gen_st_vec(tmp, cpu_env, dofs + i * step); + } + tcg_temp_free_vec(tmp); + + for (i = 0; i < (n / 2 + n % 2); ++i) { + tcg_temp_free_vec(ta[i]); + } + } + if (oprsz < maxsz) { + expand_clr(dofs + oprsz, maxsz - oprsz); + } + return; + + do_ool: + if (high) { + aofs += oprsz / 2; + } + tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, 0, + is_sign ? exts_fn[vece] : extu_fn[vece]); +} + +void tcg_gen_gvec_extul(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz) +{ + do_ext(vece, dofs, aofs, oprsz, maxsz, false, false); +} + +void tcg_gen_gvec_extuh(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz) +{ + do_ext(vece, dofs, aofs, oprsz, maxsz, true, false); +} + +void tcg_gen_gvec_extsl(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz) +{ + do_ext(vece, dofs, aofs, oprsz, maxsz, false, true); +} + +void tcg_gen_gvec_extsh(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz) +{ + do_ext(vece, dofs, aofs, oprsz, maxsz, true, true); +} diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c index 9038cc6c84..a73d094ddb 100644 --- a/tcg/tcg-op-vec.c +++ b/tcg/tcg-op-vec.c @@ -525,3 +525,42 @@ void tcg_gen_mul_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b) tcg_expand_vec_op(INDEX_op_mul_vec, type, vece, ri, ai, bi); } } + +static void do_ext(TCGOpcode opc, unsigned vece, TCGv_vec r, TCGv_vec a) +{ + TCGTemp *rt = tcgv_vec_temp(r); + TCGTemp *at = tcgv_vec_temp(a); + TCGArg ri = temp_arg(rt); + TCGArg ai = temp_arg(at); + TCGType type = rt->base_type; + int can; + + tcg_debug_assert(at->base_type == type); + can = tcg_can_emit_vec_op(opc, type, vece); + if (can > 0) { + vec_gen_2(opc, type, vece, ri, ai); + } else { + tcg_debug_assert(can < 0); + tcg_expand_vec_op(opc, type, vece, ri, ai); + } +} + +void tcg_gen_extul_vec(unsigned vece, TCGv_vec r, TCGv_vec a) +{ + do_ext(INDEX_op_extul_vec, vece, r, a); +} + +void tcg_gen_extuh_vec(unsigned vece, TCGv_vec r, TCGv_vec a) +{ + do_ext(INDEX_op_extuh_vec, vece, r, a); +} + +void tcg_gen_extsl_vec(unsigned vece, TCGv_vec r, TCGv_vec a) +{ + do_ext(INDEX_op_extsl_vec, vece, r, a); +} + +void tcg_gen_extsh_vec(unsigned vece, TCGv_vec r, TCGv_vec a) +{ + do_ext(INDEX_op_extsh_vec, vece, r, a); +} diff --git a/tcg/tcg.c b/tcg/tcg.c index 5608391dca..8c0ee0a9db 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1427,6 +1427,12 @@ bool tcg_op_supported(TCGOpcode op) case INDEX_op_trne_vec: case INDEX_op_trno_vec: return have_vec && TCG_TARGET_HAS_trn_vec; + case INDEX_op_extul_vec: + case INDEX_op_extsl_vec: + return have_vec && TCG_TARGET_HAS_extl_vec; + case INDEX_op_extuh_vec: + case INDEX_op_extsh_vec: + return have_vec && TCG_TARGET_HAS_exth_vec; default: tcg_debug_assert(op > INDEX_op_last_generic && op < NB_OPS); diff --git a/tcg/README b/tcg/README index 17695ff7f6..56c70764bc 100644 --- a/tcg/README +++ b/tcg/README @@ -634,6 +634,19 @@ E.g. VECL=1 -> 64 << 1 -> v128, and VECE=2 -> 1 << 2 -> i32. v0[2i + 1] = v2[2i + part]; } +* extul_vec v0, v1 + + Extend unsigned the low VECL/VECE/2 elements of v1 into v0. + +* extuh_vec v0, v1 + + Similarly for the high VECL/VECE/2 elements. + +* extsl_vec v0, v1 +* extsh_vec v0, v1 + + Similarly with signed extension. + * cmp_vec v0, v1, v2, cond Compare vectors by element, storing -1 for true and 0 for false. From patchwork Sat Jan 6 03:13:33 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123606 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp124595qgn; Fri, 5 Jan 2018 19:19:00 -0800 (PST) X-Google-Smtp-Source: ACJfBosKhPQbln/K4ZQfflQcHkABchEhOxYibj4Y31+5ofDnAuf20aavk0t4BPAhgEMsGs2ykHju X-Received: by 10.129.175.9 with SMTP id n9mr4720000ywh.149.1515208740824; Fri, 05 Jan 2018 19:19:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515208740; cv=none; d=google.com; s=arc-20160816; b=v/WVPqJWYRWMeIZP9Zjvf4vHByunPHmbCsvZqU6yQv6eeW6HuVug/Y1xQ6gvp3TiCt gPQn3nvqxJ3ypbf8EVSliRNSFm84pyFyGcNj1TUoFtso4BqPIlhImKUQelbAZwhcX8H2 2Mxm0W0ORK5FU2YoGdh8K+tD6hk9VB2yEAelKjEoWb7YrAlbULx/oUR5N+OysBGcvyQR fABVjXaVUO8bt/ow73BWYycuT/jx1lyXAy5SyHwTV2xdk9TpwYRr02vkTtTUFjIUaA6Z 4wZ6oKEeMjuQDDanJ1y6k7eIYTLT4MzoBZ5Mnx92HKbSNG06ILVGYSakMOin0NQGELav MkbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=gZ4/CIwgPeKPPZOkNDKb2k6qG+YTuYZ5ofWsXY5Ka+0=; b=eOSH8wsN09LhlZHgQHJIh/FrmwojZVRaj09IWNKNAmWawH9zIQknXYENOYi4W8J2HU biaUaEeQw5mbVEsGuRBuEQzDTMVKDwAl8FtIg/yodPrcDJn22WesIbl65vM0igVqFF9n d5+1lNnYnoqv9yVOousqT5X9uOGzTaM1Zg4Eb56oUifUdvPgOUF9pYEs35ZmPSTg5IFH B+tbcdlCW7O+c+b5o5zsAt8zkNUpIT5FoGbv/45X4bN4AdIMwiAiO7oZRBhykzRVL9Ij 3uxNKmf4OpeHGRoe9fEdytqG2kfEam4POyuuMtHMsxKPQAHaGWM6lugDHBTdD7nYP9sI c9ug== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=KmVDhKqv; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id s100si1501233ybi.647.2018.01.05.19.19.00 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:19:00 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=KmVDhKqv; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44048 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXf0e-0002ZE-9C for patch@linaro.org; Fri, 05 Jan 2018 22:19:00 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48337) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXevy-0007XR-5R for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:12 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXevv-00059J-Qd for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:10 -0500 Received: from mail-pf0-x241.google.com ([2607:f8b0:400e:c00::241]:39158) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXevv-00057B-Fz for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:07 -0500 Received: by mail-pf0-x241.google.com with SMTP id l24so2963500pfj.6 for ; Fri, 05 Jan 2018 19:14:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=gZ4/CIwgPeKPPZOkNDKb2k6qG+YTuYZ5ofWsXY5Ka+0=; b=KmVDhKqvfzq2q08kVg0J4YIXhI+npU5d7atF0KLW0I07+CMaI/3wX0MeQrhDydkTog pjMnWFiyz8HB02HepuOGmLjx+LhKlA35oVfV2PMFOvP/N9cvFR5T1zeuu20GFstSL8iL M0B7eDh7KJsQthFMrJBga1PI/1XS4uw6H6OIo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=gZ4/CIwgPeKPPZOkNDKb2k6qG+YTuYZ5ofWsXY5Ka+0=; b=dqgA3VSYs0mtlmwdr56SHUn2m56AJjgmnsXzs9SogeSRx+yP/2DZZOfidXUYMBS2u1 VZQBKFyDqxbz0XMeYDl7bh5CcxXxHNhsfhpJP0Wn8rKzPMN7WYVTOae+5sQLMnxbz3I2 LFIY7slb8TeDj8leHmtMs/T+EZ0mxRLWs8BMV7AvFjBMfpyEX2kaqmnp25KdN9yKWhmv 7StNhO6r0SO0LgeyryZyP2r7Vww7ohN6n/ElHUO8JTjciD1vDopinFSb5K++CE8h0JaP 7PGxnlkLFXgVnlu3quq1FgTU5ivztESBtZJI/m2gA5ntHZWlwmq/rg84xJ8XeLdyKwEF O7EA== X-Gm-Message-State: AKGB3mKEwPfiiFJHce5EAtoCJ3I8l05yFGAP10+iIJlr3YYrLkdCztTY JgvYVEFQSM3gKv82ppWv7L1rPuiFEIU= X-Received: by 10.98.86.217 with SMTP id h86mr4775568pfj.105.1515208446110; Fri, 05 Jan 2018 19:14:06 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.14.04 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:14:05 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:33 -0800 Message-Id: <20180106031346.6650-11-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::241 Subject: [Qemu-devel] [PATCH v8 10/23] tcg: Add generic helpers for saturating arithmetic X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" No vector ops as yet. SSE only has direct support for 8- and 16-bit saturation; handling 32- and 64-bit saturation is much more expensive. Signed-off-by: Richard Henderson --- accel/tcg/tcg-runtime.h | 20 ++++ tcg/tcg-op-gvec.h | 10 ++ accel/tcg/tcg-runtime-gvec.c | 268 +++++++++++++++++++++++++++++++++++++++++++ tcg/tcg-op-gvec.c | 92 +++++++++++++++ 4 files changed, 390 insertions(+) -- 2.14.3 diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h index d1b3542946..ec187a094b 100644 --- a/accel/tcg/tcg-runtime.h +++ b/accel/tcg/tcg-runtime.h @@ -157,6 +157,26 @@ DEF_HELPER_FLAGS_4(gvec_mul16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_mul32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_mul64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ssadd8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ssadd16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ssadd32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ssadd64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_sssub8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sssub16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sssub32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sssub64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_usadd8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_usadd16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_usadd32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_usadd64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_ussub8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ussub16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ussub32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_ussub64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_3(gvec_neg8, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(gvec_neg16, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(gvec_neg32, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h index 91daf9e6ef..bcdf0eb413 100644 --- a/tcg/tcg-op-gvec.h +++ b/tcg/tcg-op-gvec.h @@ -179,6 +179,16 @@ void tcg_gen_gvec_sub(unsigned vece, uint32_t dofs, uint32_t aofs, void tcg_gen_gvec_mul(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz); +/* Saturated arithmetic. */ +void tcg_gen_gvec_ssadd(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_sssub(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_usadd(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); +void tcg_gen_gvec_ussub(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t clsz); + void tcg_gen_gvec_and(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz); void tcg_gen_gvec_or(unsigned vece, uint32_t dofs, uint32_t aofs, diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c index ff26be0744..e84c900670 100644 --- a/accel/tcg/tcg-runtime-gvec.c +++ b/accel/tcg/tcg-runtime-gvec.c @@ -614,3 +614,271 @@ DO_EXT(gvec_extu32, uint32_t, uint64_t) DO_EXT(gvec_exts8, int8_t, int16_t) DO_EXT(gvec_exts16, int16_t, int32_t) DO_EXT(gvec_exts32, int32_t, int64_t) + +void HELPER(gvec_ssadd8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(int8_t)) { + int r = *(int8_t *)(a + i) + *(int8_t *)(b + i); + if (r > INT8_MAX) { + r = INT8_MAX; + } else if (r < INT8_MIN) { + r = INT8_MIN; + } + *(int8_t *)(d + i) = r; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_ssadd16)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(int16_t)) { + int r = *(int16_t *)(a + i) + *(int16_t *)(b + i); + if (r > INT16_MAX) { + r = INT16_MAX; + } else if (r < INT16_MIN) { + r = INT16_MIN; + } + *(int16_t *)(d + i) = r; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_ssadd32)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(int32_t)) { + int32_t ai = *(int32_t *)(a + i); + int32_t bi = *(int32_t *)(b + i); + int32_t di = ai + bi; + if (((di ^ ai) &~ (ai ^ bi)) < 0) { + /* Signed overflow. */ + di = (di < 0 ? INT32_MAX : INT32_MIN); + } + *(int32_t *)(d + i) = di; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_ssadd64)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(int64_t)) { + int64_t ai = *(int64_t *)(a + i); + int64_t bi = *(int64_t *)(b + i); + int64_t di = ai + bi; + if (((di ^ ai) &~ (ai ^ bi)) < 0) { + /* Signed overflow. */ + di = (di < 0 ? INT64_MAX : INT64_MIN); + } + *(int64_t *)(d + i) = di; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sssub8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(uint8_t)) { + int r = *(int8_t *)(a + i) - *(int8_t *)(b + i); + if (r > INT8_MAX) { + r = INT8_MAX; + } else if (r < INT8_MIN) { + r = INT8_MIN; + } + *(uint8_t *)(d + i) = r; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sssub16)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(int16_t)) { + int r = *(int16_t *)(a + i) - *(int16_t *)(b + i); + if (r > INT16_MAX) { + r = INT16_MAX; + } else if (r < INT16_MIN) { + r = INT16_MIN; + } + *(int16_t *)(d + i) = r; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sssub32)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(int32_t)) { + int32_t ai = *(int32_t *)(a + i); + int32_t bi = *(int32_t *)(b + i); + int32_t di = ai - bi; + if (((di ^ ai) & (ai ^ bi)) < 0) { + /* Signed overflow. */ + di = (di < 0 ? INT32_MAX : INT32_MIN); + } + *(int32_t *)(d + i) = di; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_sssub64)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(int64_t)) { + int64_t ai = *(int64_t *)(a + i); + int64_t bi = *(int64_t *)(b + i); + int64_t di = ai - bi; + if (((di ^ ai) & (ai ^ bi)) < 0) { + /* Signed overflow. */ + di = (di < 0 ? INT64_MAX : INT64_MIN); + } + *(int64_t *)(d + i) = di; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_usadd8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(uint8_t)) { + unsigned r = *(uint8_t *)(a + i) + *(uint8_t *)(b + i); + if (r > UINT8_MAX) { + r = UINT8_MAX; + } + *(uint8_t *)(d + i) = r; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_usadd16)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(uint16_t)) { + unsigned r = *(uint16_t *)(a + i) + *(uint16_t *)(b + i); + if (r > UINT16_MAX) { + r = UINT16_MAX; + } + *(uint16_t *)(d + i) = r; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_usadd32)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(uint32_t)) { + uint32_t ai = *(uint32_t *)(a + i); + uint32_t bi = *(uint32_t *)(b + i); + uint32_t di = ai + bi; + if (di < ai) { + di = UINT32_MAX; + } + *(uint32_t *)(d + i) = di; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_usadd64)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(uint64_t)) { + uint64_t ai = *(uint64_t *)(a + i); + uint64_t bi = *(uint64_t *)(b + i); + uint64_t di = ai + bi; + if (di < ai) { + di = UINT64_MAX; + } + *(uint64_t *)(d + i) = di; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_ussub8)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(uint8_t)) { + int r = *(uint8_t *)(a + i) - *(uint8_t *)(b + i); + if (r < 0) { + r = 0; + } + *(uint8_t *)(d + i) = r; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_ussub16)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(uint16_t)) { + int r = *(uint16_t *)(a + i) - *(uint16_t *)(b + i); + if (r < 0) { + r = 0; + } + *(uint16_t *)(d + i) = r; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_ussub32)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(uint32_t)) { + uint32_t ai = *(uint32_t *)(a + i); + uint32_t bi = *(uint32_t *)(b + i); + uint32_t di = ai - bi; + if (ai < bi) { + di = 0; + } + *(uint32_t *)(d + i) = di; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_ussub64)(void *d, void *a, void *b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(uint64_t)) { + uint64_t ai = *(uint64_t *)(a + i); + uint64_t bi = *(uint64_t *)(b + i); + uint64_t di = ai - bi; + if (ai < bi) { + di = 0; + } + *(uint64_t *)(d + i) = di; + } + clear_high(d, oprsz, desc); +} diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index ec273de3dc..39799f9c3e 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -1433,6 +1433,98 @@ void tcg_gen_gvec_mul(unsigned vece, uint32_t dofs, uint32_t aofs, tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]); } +void tcg_gen_gvec_ssadd(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen3 g[4] = { + { .fno = gen_helper_gvec_ssadd8, .vece = MO_8 }, + { .fno = gen_helper_gvec_ssadd16, .vece = MO_16 }, + { .fno = gen_helper_gvec_ssadd32, .vece = MO_32 }, + { .fno = gen_helper_gvec_ssadd64, .vece = MO_64 } + }; + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g[vece]); +} + +void tcg_gen_gvec_sssub(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen3 g[4] = { + { .fno = gen_helper_gvec_sssub8, .vece = MO_8 }, + { .fno = gen_helper_gvec_sssub16, .vece = MO_16 }, + { .fno = gen_helper_gvec_sssub32, .vece = MO_32 }, + { .fno = gen_helper_gvec_sssub64, .vece = MO_64 } + }; + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g[vece]); +} + +static void tcg_gen_vec_usadd32_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b) +{ + TCGv_i32 max = tcg_const_i32(-1); + tcg_gen_add_i32(d, a, b); + tcg_gen_movcond_i32(TCG_COND_LTU, d, d, a, max, d); + tcg_temp_free_i32(max); +} + +static void tcg_gen_vec_usadd32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 max = tcg_const_i64(-1); + tcg_gen_add_i64(d, a, b); + tcg_gen_movcond_i64(TCG_COND_LTU, d, d, a, max, d); + tcg_temp_free_i64(max); +} + +void tcg_gen_gvec_usadd(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen3 g[4] = { + { .fno = gen_helper_gvec_usadd8, .vece = MO_8 }, + { .fno = gen_helper_gvec_usadd16, .vece = MO_16 }, + { .fni4 = tcg_gen_vec_usadd32_i32, + .fno = gen_helper_gvec_usadd32, + .vece = MO_32 }, + { .fni8 = tcg_gen_vec_usadd32_i64, + .fno = gen_helper_gvec_usadd64, + .vece = MO_64 } + }; + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g[vece]); +} + +static void tcg_gen_vec_ussub32_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b) +{ + TCGv_i32 min = tcg_const_i32(0); + tcg_gen_sub_i32(d, a, b); + tcg_gen_movcond_i32(TCG_COND_LTU, d, a, b, min, d); + tcg_temp_free_i32(min); +} + +static void tcg_gen_vec_ussub32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + TCGv_i64 min = tcg_const_i64(0); + tcg_gen_sub_i64(d, a, b); + tcg_gen_movcond_i64(TCG_COND_LTU, d, a, b, min, d); + tcg_temp_free_i64(min); +} + +void tcg_gen_gvec_ussub(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t bofs, uint32_t opsz, uint32_t maxsz) +{ + static const GVecGen3 g[4] = { + { .fno = gen_helper_gvec_ussub8, .vece = MO_8 }, + { .fno = gen_helper_gvec_ussub16, .vece = MO_16 }, + { .fni4 = tcg_gen_vec_ussub32_i32, + .fno = gen_helper_gvec_ussub32, + .vece = MO_32 }, + { .fni8 = tcg_gen_vec_ussub32_i64, + .fno = gen_helper_gvec_ussub64, + .vece = MO_64 } + }; + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3(dofs, aofs, bofs, opsz, maxsz, &g[vece]); +} + /* Perform a vector negation using normal negation and a mask. Compare gen_subv_mask above. */ static void gen_negv_mask(TCGv_i64 d, TCGv_i64 b, TCGv_i64 m) From patchwork Sat Jan 6 03:13:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123619 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp130602qgn; Fri, 5 Jan 2018 19:28:46 -0800 (PST) X-Google-Smtp-Source: ACJfBostL/I0edpVMNg0ZSBkuiGlZouhaeGJsDgoYyozpFXPBRuDGtBzuy0trdafC2gCVJBtoaH1 X-Received: by 10.129.48.139 with SMTP id w133mr155960yww.111.1515209326017; Fri, 05 Jan 2018 19:28:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515209326; cv=none; d=google.com; s=arc-20160816; b=caDVTtWwQwApyWuTDh4Zm/60ne2qQQX27S002AnykkdxWcOpE4dkzwsManlW9wA9DI 3FP4cc3fIK7W6+au1eRdUDaxbyLjRCL23CMT7cnv82PxoxbAuE+cmbeKHO/Xk06pCKCX AzHgCowlKBrjo3dU4zVBZC5jZ9gjn3eeOJ8nk7UdlQZWw+RPo01D8v1DJr1ZTw+RoYOL FgWh7c6R33IoXteL+zFJqBMnWVzHlsGq8L9ElWjETd/A4UKOmVo6p2/9YRJw7gmJV39W qHqnnhee/X5A/yHM0rBFKRVviKOgQEI/H+pwObIR5y94y5NZvVCuGsLl3hYf+EdnN2NM 2Lvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=McRr+rCDkbHUf7L9LisIOkM82JeD/YX+GQGWIdxJppA=; b=vgpfQtjzL9oCacbL5XCeRs6bYoZyK6seOpbPMyfHumpZsa+CH+du0dLAS1sKk9qNPh 1jojPRwZY6iquKcG4HY2TbYN3wsvajwKJbo/m/qHnUfcb+oEdeetIQD9cZ/EKN7ZE9+7 UF8FjZltoVbHXxKz6HJ/t/qmcXg3W5qAfDvE1WbLEesJYU2ZsU5yhbdVDiimhGJ4f3Rj tp9HAd/SGDIbi/Hq4c65phD4wrIwjZ0noo/mR5bf1IyKBVIqytgK83qRaG2/k1zCwDXV Mjw677K+ayLbkGbBhVrHk6x268SpkPqNxejBe8FT8NIqsIarIaab5oFAwE04Az02K2WC 820Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=Xr2d97YT; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id s64si874187ywd.79.2018.01.05.19.28.45 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:28:46 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=Xr2d97YT; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44111 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXfA5-00036b-Ct for patch@linaro.org; Fri, 05 Jan 2018 22:28:45 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48357) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXevz-0007Z8-Ov for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:15 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXevx-0005Cq-8G for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:11 -0500 Received: from mail-pf0-x244.google.com ([2607:f8b0:400e:c00::244]:32978) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXevw-0005As-Ty for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:09 -0500 Received: by mail-pf0-x244.google.com with SMTP id y89so2973684pfk.0 for ; Fri, 05 Jan 2018 19:14:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=McRr+rCDkbHUf7L9LisIOkM82JeD/YX+GQGWIdxJppA=; b=Xr2d97YTtRs9HPvlcLhPNB9rshiotZWKXTO1Vfd32yUnQl41zXNY4oC4t4xkJJ2E5/ 4C7MkbFQildSop9a9NxhuBokta6pyKtOmahonGX2XKP1qWN0UcZqCejUidnPXCS9rP17 THShN7mny61UZ07FtFv1OiF0kXozrz+oPnnZY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=McRr+rCDkbHUf7L9LisIOkM82JeD/YX+GQGWIdxJppA=; b=LEstjiRy31QQE9xU39w3sqECQ+wb3PYc1xIXYF9mKfJi5R7JwQW94nRWZrhu48toO1 1pJKHeV9LUwVa0r+qSuMNmJ8+MLbiPZB0PapefCJCuM+gS1rNQKrs9HL5HcN9lBleO73 3Uc3qDjvDfqrYnrDrvq0M3ZtlBENKTMYundKy5gLdxZxOE6G+vcjV9VVGAEI3Y9FaE21 XySS7ct7HZHyW7x7bFOqg4KBsAyNwQveoH+u7sgekeHbfivcTiiUPGL8zij5C88nxtht HB3f01RAY8eMUarFBxZRG7sfIDCmwBciGjAmiIO/lTueabB81+dzw6GUUPNd9SoPEn5N LgJg== X-Gm-Message-State: AKGB3mK0mtx+i29SvNwh2/0aQnOM8elIdWv89fEB2oUTTm2teVd5iWCY KySDtVKcms1VGBKGIyt1/aEu4fuQRNw= X-Received: by 10.98.4.1 with SMTP id 1mr4760109pfe.19.1515208447524; Fri, 05 Jan 2018 19:14:07 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.14.06 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:14:06 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:34 -0800 Message-Id: <20180106031346.6650-12-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::244 Subject: [Qemu-devel] [PATCH v8 11/23] tcg: Add generic vector helpers with a scalar immediate operand X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" We already have immediate shifts. Add addition, multiplication, and logical operations with an immediate. Subtraction can thus be done with negation of the constant. Signed-off-by: Richard Henderson --- accel/tcg/tcg-runtime.h | 14 ++++ tcg/tcg-op-gvec.h | 22 ++++- accel/tcg/tcg-runtime-gvec.c | 136 +++++++++++++++++++++++++++++++ tcg/tcg-op-gvec.c | 186 ++++++++++++++++++++++++++++++++++++++++++- 4 files changed, 356 insertions(+), 2 deletions(-) -- 2.14.3 diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h index ec187a094b..cf0b5e0d9a 100644 --- a/accel/tcg/tcg-runtime.h +++ b/accel/tcg/tcg-runtime.h @@ -147,6 +147,11 @@ DEF_HELPER_FLAGS_4(gvec_add16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_add32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_add64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_addi8, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_addi16, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_addi32, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) +DEF_HELPER_FLAGS_4(gvec_addi64, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) + DEF_HELPER_FLAGS_4(gvec_sub8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_sub16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_sub32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) @@ -157,6 +162,11 @@ DEF_HELPER_FLAGS_4(gvec_mul16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_mul32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_mul64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_muli8, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_muli16, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_muli32, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) +DEF_HELPER_FLAGS_4(gvec_muli64, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) + DEF_HELPER_FLAGS_4(gvec_ssadd8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_ssadd16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_ssadd32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) @@ -189,6 +199,10 @@ DEF_HELPER_FLAGS_4(gvec_xor, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_andc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_orc, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_andi, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) +DEF_HELPER_FLAGS_4(gvec_xori, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) +DEF_HELPER_FLAGS_4(gvec_ori, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) + DEF_HELPER_FLAGS_3(gvec_shl8i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(gvec_shl16i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(gvec_shl32i, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h index bcdf0eb413..a27a2eea87 100644 --- a/tcg/tcg-op-gvec.h +++ b/tcg/tcg-op-gvec.h @@ -35,6 +35,12 @@ void tcg_gen_gvec_2_ool(uint32_t dofs, uint32_t aofs, uint32_t oprsz, uint32_t maxsz, int32_t data, gen_helper_gvec_2 *fn); +/* Similarly, passing an extra data value. */ +typedef void gen_helper_gvec_2i(TCGv_ptr, TCGv_ptr, TCGv_i64, TCGv_i32); +void tcg_gen_gvec_2i_ool(uint32_t dofs, uint32_t aofs, TCGv_i64 c, + uint32_t oprsz, uint32_t maxsz, int32_t data, + gen_helper_gvec_2i *fn); + /* Similarly, passing an extra pointer (e.g. env or float_status). */ typedef void gen_helper_gvec_2_ptr(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32); void tcg_gen_gvec_2_ptr(uint32_t dofs, uint32_t aofs, @@ -102,8 +108,10 @@ typedef struct { void (*fni4)(TCGv_i32, TCGv_i32, int32_t); /* Expand inline with a host vector type. */ void (*fniv)(unsigned, TCGv_vec, TCGv_vec, int64_t); - /* Expand out-of-line helper w/descriptor. */ + /* Expand out-of-line helper w/descriptor, data in descriptor. */ gen_helper_gvec_2 *fno; + /* Expand out-of-line helper w/descriptor, data as argument. */ + gen_helper_gvec_2i *fnoi; /* The opcode, if any, to which this corresponds. */ TCGOpcode opc; /* The vector element size, if applicable. */ @@ -179,6 +187,11 @@ void tcg_gen_gvec_sub(unsigned vece, uint32_t dofs, uint32_t aofs, void tcg_gen_gvec_mul(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_addi(unsigned vece, uint32_t dofs, uint32_t aofs, + int64_t c, uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_muli(unsigned vece, uint32_t dofs, uint32_t aofs, + int64_t c, uint32_t oprsz, uint32_t maxsz); + /* Saturated arithmetic. */ void tcg_gen_gvec_ssadd(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t opsz, uint32_t clsz); @@ -200,6 +213,13 @@ void tcg_gen_gvec_andc(unsigned vece, uint32_t dofs, uint32_t aofs, void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_andi(unsigned vece, uint32_t dofs, uint32_t aofs, + int64_t c, uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_xori(unsigned vece, uint32_t dofs, uint32_t aofs, + int64_t c, uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_ori(unsigned vece, uint32_t dofs, uint32_t aofs, + int64_t c, uint32_t oprsz, uint32_t maxsz); + void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t s, uint32_t m); void tcg_gen_gvec_dup_i32(unsigned vece, uint32_t dofs, uint32_t s, diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c index e84c900670..370375c49e 100644 --- a/accel/tcg/tcg-runtime-gvec.c +++ b/accel/tcg/tcg-runtime-gvec.c @@ -97,6 +97,56 @@ void HELPER(gvec_add64)(void *d, void *a, void *b, uint32_t desc) clear_high(d, oprsz, desc); } +void HELPER(gvec_addi8)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + uint8_t b = simd_data(desc); + vec8 vecb = (vec8){ b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b }; + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(vec8 *)(d + i) = *(vec8 *)(a + i) + vecb; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_addi16)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + uint16_t b = simd_data(desc); + vec16 vecb = (vec16){ b, b, b, b, b, b, b, b }; + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(vec16 *)(d + i) = *(vec16 *)(a + i) + vecb; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_addi32)(void *d, void *a, uint64_t b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + vec32 vecb = (vec32){ b, b, b, b }; + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(vec32 *)(d + i) = *(vec32 *)(a + i) + vecb; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_addi64)(void *d, void *a, uint64_t b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + vec64 vecb = (vec64){ b, b }; + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) + vecb; + } + clear_high(d, oprsz, desc); +} + void HELPER(gvec_sub8)(void *d, void *a, void *b, uint32_t desc) { intptr_t oprsz = simd_oprsz(desc); @@ -185,6 +235,56 @@ void HELPER(gvec_mul64)(void *d, void *a, void *b, uint32_t desc) clear_high(d, oprsz, desc); } +void HELPER(gvec_muli8)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + uint8_t b = simd_data(desc); + vec8 vecb = (vec8){ b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b }; + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec8)) { + *(vec8 *)(d + i) = *(vec8 *)(a + i) * vecb; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_muli16)(void *d, void *a, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + uint16_t b = simd_data(desc); + vec16 vecb = (vec16){ b, b, b, b, b, b, b, b }; + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec16)) { + *(vec16 *)(d + i) = *(vec16 *)(a + i) * vecb; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_muli32)(void *d, void *a, uint64_t b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + vec32 vecb = (vec32){ b, b, b, b }; + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec32)) { + *(vec32 *)(d + i) = *(vec32 *)(a + i) * vecb; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_muli64)(void *d, void *a, uint64_t b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + vec64 vecb = (vec64){ b, b }; + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) * vecb; + } + clear_high(d, oprsz, desc); +} + void HELPER(gvec_neg8)(void *d, void *a, uint32_t desc) { intptr_t oprsz = simd_oprsz(desc); @@ -343,6 +443,42 @@ void HELPER(gvec_orc)(void *d, void *a, void *b, uint32_t desc) clear_high(d, oprsz, desc); } +void HELPER(gvec_andi)(void *d, void *a, uint64_t b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + vec64 vecb = (vec64){ b, b }; + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) & vecb; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_xori)(void *d, void *a, uint64_t b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + vec64 vecb = (vec64){ b, b }; + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) ^ vecb; + } + clear_high(d, oprsz, desc); +} + +void HELPER(gvec_ori)(void *d, void *a, uint64_t b, uint32_t desc) +{ + intptr_t oprsz = simd_oprsz(desc); + vec64 vecb = (vec64){ b, b }; + intptr_t i; + + for (i = 0; i < oprsz; i += sizeof(vec64)) { + *(vec64 *)(d + i) = *(vec64 *)(a + i) | vecb; + } + clear_high(d, oprsz, desc); +} + void HELPER(gvec_shl8i)(void *d, void *a, uint32_t desc) { intptr_t oprsz = simd_oprsz(desc); diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index 39799f9c3e..fc13dd53c8 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -106,6 +106,28 @@ void tcg_gen_gvec_2_ool(uint32_t dofs, uint32_t aofs, tcg_temp_free_i32(desc); } +/* Generate a call to a gvec-style helper with two vector operands + and one scalar operand. */ +void tcg_gen_gvec_2i_ool(uint32_t dofs, uint32_t aofs, TCGv_i64 c, + uint32_t oprsz, uint32_t maxsz, int32_t data, + gen_helper_gvec_2i *fn) +{ + TCGv_ptr a0, a1; + TCGv_i32 desc = tcg_const_i32(simd_desc(oprsz, maxsz, data)); + + a0 = tcg_temp_new_ptr(); + a1 = tcg_temp_new_ptr(); + + tcg_gen_addi_ptr(a0, cpu_env, dofs); + tcg_gen_addi_ptr(a1, cpu_env, aofs); + + fn(a0, a1, c, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_i32(desc); +} + /* Generate a call to a gvec-style helper with three vector operands. */ void tcg_gen_gvec_3_ool(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t oprsz, uint32_t maxsz, int32_t data, @@ -922,7 +944,13 @@ void tcg_gen_gvec_2i(uint32_t dofs, uint32_t aofs, uint32_t oprsz, } do_ool: - tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, c, g->fno); + if (g->fno) { + tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, c, g->fno); + } else { + TCGv_i64 tcg_c = tcg_const_i64(c); + tcg_gen_gvec_2i_ool(dofs, aofs, tcg_c, oprsz, maxsz, c, g->fnoi); + tcg_temp_free_i64(tcg_c); + } } /* Expand a vector three-operand operation. */ @@ -1325,6 +1353,59 @@ void tcg_gen_gvec_add(unsigned vece, uint32_t dofs, uint32_t aofs, tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]); } +static void tcg_gen_vec_addi8_i64(TCGv_i64 d, TCGv_i64 a, int64_t b) +{ + TCGv_i64 t = tcg_const_i64((b & 0xff) * (-1ull / 0xff)); + tcg_gen_vec_add8_i64(d, a, t); + tcg_temp_free_i64(t); +} + +static void tcg_gen_vec_addi16_i64(TCGv_i64 d, TCGv_i64 a, int64_t b) +{ + TCGv_i64 t = tcg_const_i64((b & 0xffff) * (-1ull / 0xffff)); + tcg_gen_vec_add16_i64(d, a, t); + tcg_temp_free_i64(t); +} + +static void tcg_gen_addi_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t b) +{ + TCGv_vec t = tcg_temp_new_vec_matching(d); + tcg_gen_dupi_vec(vece, t, b); + tcg_gen_add_vec(vece, d, a, t); + tcg_temp_free_vec(t); +} + +void tcg_gen_gvec_addi(unsigned vece, uint32_t dofs, uint32_t aofs, + int64_t c, uint32_t oprsz, uint32_t maxsz) +{ + static const GVecGen2i g[4] = { + { .fni8 = tcg_gen_vec_addi8_i64, + .fniv = tcg_gen_addi_vec, + .fno = gen_helper_gvec_addi8, + .opc = INDEX_op_add_vec, + .vece = MO_8 }, + { .fni8 = tcg_gen_vec_addi16_i64, + .fniv = tcg_gen_addi_vec, + .fno = gen_helper_gvec_addi16, + .opc = INDEX_op_add_vec, + .vece = MO_16 }, + { .fni4 = tcg_gen_addi_i32, + .fniv = tcg_gen_addi_vec, + .fnoi = gen_helper_gvec_addi32, + .opc = INDEX_op_add_vec, + .vece = MO_32 }, + { .fni8 = tcg_gen_addi_i64, + .fniv = tcg_gen_addi_vec, + .fnoi = gen_helper_gvec_addi64, + .opc = INDEX_op_add_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_2i(dofs, aofs, oprsz, maxsz, c, &g[vece]); +} + /* Perform a vector subtraction using normal subtraction and a mask. Compare gen_addv_mask above. */ static void gen_subv_mask(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, TCGv_i64 m) @@ -1433,6 +1514,43 @@ void tcg_gen_gvec_mul(unsigned vece, uint32_t dofs, uint32_t aofs, tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]); } +static void tcg_gen_muli_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t b) +{ + TCGv_vec t = tcg_temp_new_vec_matching(d); + tcg_gen_dupi_vec(vece, t, b); + tcg_gen_mul_vec(vece, d, a, t); + tcg_temp_free_vec(t); +} + +void tcg_gen_gvec_muli(unsigned vece, uint32_t dofs, uint32_t aofs, + int64_t c, uint32_t oprsz, uint32_t maxsz) +{ + static const GVecGen2i g[4] = { + { .fniv = tcg_gen_muli_vec, + .fno = gen_helper_gvec_muli8, + .opc = INDEX_op_mul_vec, + .vece = MO_8 }, + { .fniv = tcg_gen_muli_vec, + .fno = gen_helper_gvec_muli16, + .opc = INDEX_op_mul_vec, + .vece = MO_16 }, + { .fni4 = tcg_gen_muli_i32, + .fniv = tcg_gen_muli_vec, + .fnoi = gen_helper_gvec_muli32, + .opc = INDEX_op_mul_vec, + .vece = MO_32 }, + { .fni8 = tcg_gen_muli_i64, + .fniv = tcg_gen_muli_vec, + .fnoi = gen_helper_gvec_muli64, + .opc = INDEX_op_mul_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 }, + }; + + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_2i(dofs, aofs, oprsz, maxsz, c, &g[vece]); +} + void tcg_gen_gvec_ssadd(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t opsz, uint32_t maxsz) { @@ -1665,6 +1783,72 @@ void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, uint32_t aofs, tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g); } +static void tcg_gen_andi_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t b) +{ + TCGv_vec t = tcg_temp_new_vec_matching(d); + tcg_gen_dupi_vec(vece, t, b); + tcg_gen_and_vec(vece, d, a, t); + tcg_temp_free_vec(t); +} + +void tcg_gen_gvec_andi(unsigned vece, uint32_t dofs, uint32_t aofs, + int64_t c, uint32_t oprsz, uint32_t maxsz) +{ + static const GVecGen2i g = { + .fni8 = tcg_gen_andi_i64, + .fniv = tcg_gen_andi_vec, + .fnoi = gen_helper_gvec_andi, + .opc = INDEX_op_and_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 + }; + tcg_gen_gvec_2i(dofs, aofs, oprsz, maxsz, c, &g); +} + +static void tcg_gen_xori_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t b) +{ + TCGv_vec t = tcg_temp_new_vec_matching(d); + tcg_gen_dupi_vec(vece, t, b); + tcg_gen_xor_vec(vece, d, a, t); + tcg_temp_free_vec(t); +} + +void tcg_gen_gvec_xori(unsigned vece, uint32_t dofs, uint32_t aofs, + int64_t c, uint32_t oprsz, uint32_t maxsz) +{ + static const GVecGen2i g = { + .fni8 = tcg_gen_xori_i64, + .fniv = tcg_gen_xori_vec, + .fnoi = gen_helper_gvec_xori, + .opc = INDEX_op_xor_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 + }; + tcg_gen_gvec_2i(dofs, aofs, oprsz, maxsz, c, &g); +} + +static void tcg_gen_ori_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t b) +{ + TCGv_vec t = tcg_temp_new_vec_matching(d); + tcg_gen_dupi_vec(vece, t, b); + tcg_gen_or_vec(vece, d, a, t); + tcg_temp_free_vec(t); +} + +void tcg_gen_gvec_ori(unsigned vece, uint32_t dofs, uint32_t aofs, + int64_t c, uint32_t oprsz, uint32_t maxsz) +{ + static const GVecGen2i g = { + .fni8 = tcg_gen_ori_i64, + .fniv = tcg_gen_ori_vec, + .fnoi = gen_helper_gvec_ori, + .opc = INDEX_op_or_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 + }; + tcg_gen_gvec_2i(dofs, aofs, oprsz, maxsz, c, &g); +} + void tcg_gen_vec_shl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c) { uint64_t mask = ((0xff << c) & 0xff) * (-1ull / 0xff); From patchwork Sat Jan 6 03:13:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123622 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp133350qgn; Fri, 5 Jan 2018 19:32:44 -0800 (PST) X-Google-Smtp-Source: ACJfBotF7RpMHbtW4epX4r9nepXVZPI9sJphTNO5y5kqr9MPeBKyxX++6SjX7Myok0JxPRvUo3q/ X-Received: by 10.13.206.193 with SMTP id q184mr4625933ywd.326.1515209564775; Fri, 05 Jan 2018 19:32:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515209564; cv=none; d=google.com; s=arc-20160816; b=HND7PGpEmNM38x/UU/moTsjnUXu5HvdACfKxkCVX4T/5tOCOnp++pBjWlbVIafwOQU I2XtBJ9o+eI5xyL2eg1+yLiQ8az/jewerQlju6HWNFgnza9yefJTF/LP2nsv+ElmSgic xWBj/YBp6G7f9O2HbJZqI05dhhjQwLQ2FYnr52k4y6Lel0uapfG9UA5TQ0oDB5bfu/pn K5IQFjL9k7FZgTIh664Sav35eauCR3OUc7Q1kSfJ/nOcrY6S3CR3bPklEcdEeqpYZM90 acIh0ogmRsqyJpctTafwVFLeB/mQDcXwgQgu1BDK3qPVVuuuGCMYpqiQ/DHrpivNReLg FjHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=ecSxpiLajjqbroC9M5HiLYa4/wffUO6dfZeKFuckgCw=; b=LVnut8FQL7FPKp3s3gOm3uoxDenpl2VKLZ+/Yz0aBM+aNa4CIqudkuAae0Wjmo+B0C wtQsGLNIMjjdpuJBszxYppvfMnZvMoKueJUdXR6Ry+ohpgzg72K8vc+IB4TjY2ZxtKz4 ArM1J7amRbBDn9ew7M7H9aNZcZkXrScZ5bMS7qv+kBls8yH4E0nOOtahq3CwpWLtcEZb pIAWh4v+aFinzLyaQQSPwfeIly4Zix4VKBTOwmSrmb9U7xdnAFMeWIalEY8AglZg5MRE F8V+7NF7z8RzDKlfVWfiQeQu01dcQzartLFEl2Ihx9hoRaqad9FJgPPgydaw60DuzI1J ourQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=JC1bcBVY; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id c1si89583ybl.302.2018.01.05.19.32.44 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:32:44 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=JC1bcBVY; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44454 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXfDw-0006Le-9H for patch@linaro.org; Fri, 05 Jan 2018 22:32:44 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48368) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXew0-0007Zk-88 for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:15 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXevy-0005GI-J6 for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:12 -0500 Received: from mail-pl0-x244.google.com ([2607:f8b0:400e:c01::244]:39891) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXevy-0005EV-AH for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:10 -0500 Received: by mail-pl0-x244.google.com with SMTP id bi12so4225211plb.6 for ; Fri, 05 Jan 2018 19:14:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=ecSxpiLajjqbroC9M5HiLYa4/wffUO6dfZeKFuckgCw=; b=JC1bcBVYSW0/K7Lu9JGXiW7Y8s424iWF49Xp5HA42B2PQZoOur64VZUMlH6ufpBZlf gR4B6vz80zOcORf8fVeD42WT+civIioA4V3DbndRKpAz7KLPF/VGENGION42sy8XDkpL LjR6KdN5UcmS8tFewJpXBLHa+Ek1PCRxkly1A= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=ecSxpiLajjqbroC9M5HiLYa4/wffUO6dfZeKFuckgCw=; b=QgjWhjMU7RnjrfBd+d2Cyrs+nxwWid3gkgS6j88rEHGaA6sK5L1Q7wGUmmWOYHROsV gfQhkfawgCp43QNP98v6L+Jk5QIbhIfL9HxaEL/soqYPpJ1m11t6KksD5EHsSTh+juSR JgCHkoqbUuiLXVoVoXbrRI1HrEHsr3mUSxjcgLYDSL5RNhUU4Mc6p6CcSpQYYtQ95vrz dBuoRXEa0aIWIoQ7TLjcZjLuGZJIl5V2IPuWw+Aulm+lqLEsntnnxVA3qT0z5H9GFXoj P/c9US7Zsox+4N001cSD02mpfX7z6sFkWy1iri7ioEbMSsRSlj0o/Jthyw45qbIOjI+E R6rQ== X-Gm-Message-State: AKGB3mJ13Irh0qEKoPuEclNunV6p8gztWtTrvVK0on6XN9h67UJP+U2b BkLaegNLYziOLxaMIlaaVzUrGjoCef8= X-Received: by 10.84.242.150 with SMTP id d22mr5149212pll.136.1515208449115; Fri, 05 Jan 2018 19:14:09 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.14.07 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:14:07 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:35 -0800 Message-Id: <20180106031346.6650-13-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c01::244 Subject: [Qemu-devel] [PATCH v8 12/23] tcg/optimize: Handle vector opcodes during optimize X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Trivial move and constant propagation. Some identity and constant function folding, but nothing that requires knowledge of the size of the vector element. Signed-off-by: Richard Henderson --- tcg/optimize.c | 141 ++++++++++++++++++++++++++++----------------------------- 1 file changed, 68 insertions(+), 73 deletions(-) -- 2.14.3 diff --git a/tcg/optimize.c b/tcg/optimize.c index 2cbbeefd53..f8ddf4dd33 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -32,6 +32,11 @@ glue(glue(case INDEX_op_, x), _i32): \ glue(glue(case INDEX_op_, x), _i64) +#define CASE_OP_32_64_VEC(x) \ + glue(glue(case INDEX_op_, x), _i32): \ + glue(glue(case INDEX_op_, x), _i64): \ + glue(glue(case INDEX_op_, x), _vec) + struct tcg_temp_info { bool is_const; TCGTemp *prev_copy; @@ -108,40 +113,6 @@ static void init_arg_info(struct tcg_temp_info *infos, init_ts_info(infos, temps_used, arg_temp(arg)); } -static int op_bits(TCGOpcode op) -{ - const TCGOpDef *def = &tcg_op_defs[op]; - return def->flags & TCG_OPF_64BIT ? 64 : 32; -} - -static TCGOpcode op_to_mov(TCGOpcode op) -{ - switch (op_bits(op)) { - case 32: - return INDEX_op_mov_i32; - case 64: - return INDEX_op_mov_i64; - default: - fprintf(stderr, "op_to_mov: unexpected return value of " - "function op_bits.\n"); - tcg_abort(); - } -} - -static TCGOpcode op_to_movi(TCGOpcode op) -{ - switch (op_bits(op)) { - case 32: - return INDEX_op_movi_i32; - case 64: - return INDEX_op_movi_i64; - default: - fprintf(stderr, "op_to_movi: unexpected return value of " - "function op_bits.\n"); - tcg_abort(); - } -} - static TCGTemp *find_better_copy(TCGContext *s, TCGTemp *ts) { TCGTemp *i; @@ -199,11 +170,23 @@ static bool args_are_copies(TCGArg arg1, TCGArg arg2) static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg val) { - TCGOpcode new_op = op_to_movi(op->opc); + const TCGOpDef *def; + TCGOpcode new_op; tcg_target_ulong mask; struct tcg_temp_info *di = arg_info(dst); + def = &tcg_op_defs[op->opc]; + if (def->flags & TCG_OPF_VECTOR) { + new_op = INDEX_op_dupi_vec; + } else if (def->flags & TCG_OPF_64BIT) { + new_op = INDEX_op_movi_i64; + } else { + new_op = INDEX_op_movi_i32; + } op->opc = new_op; + /* TCGOP_VECL and TCGOP_VECE remain unchanged. */ + op->args[0] = dst; + op->args[1] = val; reset_temp(dst); di->is_const = true; @@ -214,15 +197,13 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg val) mask |= ~0xffffffffull; } di->mask = mask; - - op->args[0] = dst; - op->args[1] = val; } static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src) { TCGTemp *dst_ts = arg_temp(dst); TCGTemp *src_ts = arg_temp(src); + const TCGOpDef *def; struct tcg_temp_info *di; struct tcg_temp_info *si; tcg_target_ulong mask; @@ -236,9 +217,16 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src) reset_ts(dst_ts); di = ts_info(dst_ts); si = ts_info(src_ts); - new_op = op_to_mov(op->opc); - + def = &tcg_op_defs[op->opc]; + if (def->flags & TCG_OPF_VECTOR) { + new_op = INDEX_op_mov_vec; + } else if (def->flags & TCG_OPF_64BIT) { + new_op = INDEX_op_mov_i64; + } else { + new_op = INDEX_op_mov_i32; + } op->opc = new_op; + /* TCGOP_VECL and TCGOP_VECE remain unchanged. */ op->args[0] = dst; op->args[1] = src; @@ -417,8 +405,9 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y) static TCGArg do_constant_folding(TCGOpcode op, TCGArg x, TCGArg y) { + const TCGOpDef *def = &tcg_op_defs[op]; TCGArg res = do_constant_folding_2(op, x, y); - if (op_bits(op) == 32) { + if (!(def->flags & TCG_OPF_64BIT)) { res = (int32_t)res; } return res; @@ -508,13 +497,12 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x, tcg_target_ulong xv = arg_info(x)->val; tcg_target_ulong yv = arg_info(y)->val; if (arg_is_const(x) && arg_is_const(y)) { - switch (op_bits(op)) { - case 32: - return do_constant_folding_cond_32(xv, yv, c); - case 64: + const TCGOpDef *def = &tcg_op_defs[op]; + tcg_debug_assert(!(def->flags & TCG_OPF_VECTOR)); + if (def->flags & TCG_OPF_64BIT) { return do_constant_folding_cond_64(xv, yv, c); - default: - tcg_abort(); + } else { + return do_constant_folding_cond_32(xv, yv, c); } } else if (args_are_copies(x, y)) { return do_constant_folding_cond_eq(c); @@ -653,11 +641,11 @@ void tcg_optimize(TCGContext *s) /* For commutative operations make constant second argument */ switch (opc) { - CASE_OP_32_64(add): - CASE_OP_32_64(mul): - CASE_OP_32_64(and): - CASE_OP_32_64(or): - CASE_OP_32_64(xor): + CASE_OP_32_64_VEC(add): + CASE_OP_32_64_VEC(mul): + CASE_OP_32_64_VEC(and): + CASE_OP_32_64_VEC(or): + CASE_OP_32_64_VEC(xor): CASE_OP_32_64(eqv): CASE_OP_32_64(nand): CASE_OP_32_64(nor): @@ -722,7 +710,7 @@ void tcg_optimize(TCGContext *s) continue; } break; - CASE_OP_32_64(sub): + CASE_OP_32_64_VEC(sub): { TCGOpcode neg_op; bool have_neg; @@ -734,9 +722,12 @@ void tcg_optimize(TCGContext *s) if (opc == INDEX_op_sub_i32) { neg_op = INDEX_op_neg_i32; have_neg = TCG_TARGET_HAS_neg_i32; - } else { + } else if (opc == INDEX_op_sub_i64) { neg_op = INDEX_op_neg_i64; have_neg = TCG_TARGET_HAS_neg_i64; + } else { + neg_op = INDEX_op_neg_vec; + have_neg = TCG_TARGET_HAS_neg_vec; } if (!have_neg) { break; @@ -750,7 +741,7 @@ void tcg_optimize(TCGContext *s) } } break; - CASE_OP_32_64(xor): + CASE_OP_32_64_VEC(xor): CASE_OP_32_64(nand): if (!arg_is_const(op->args[1]) && arg_is_const(op->args[2]) @@ -767,7 +758,7 @@ void tcg_optimize(TCGContext *s) goto try_not; } break; - CASE_OP_32_64(andc): + CASE_OP_32_64_VEC(andc): if (!arg_is_const(op->args[2]) && arg_is_const(op->args[1]) && arg_info(op->args[1])->val == -1) { @@ -775,7 +766,7 @@ void tcg_optimize(TCGContext *s) goto try_not; } break; - CASE_OP_32_64(orc): + CASE_OP_32_64_VEC(orc): CASE_OP_32_64(eqv): if (!arg_is_const(op->args[2]) && arg_is_const(op->args[1]) @@ -789,7 +780,10 @@ void tcg_optimize(TCGContext *s) TCGOpcode not_op; bool have_not; - if (def->flags & TCG_OPF_64BIT) { + if (def->flags & TCG_OPF_VECTOR) { + not_op = INDEX_op_not_vec; + have_not = TCG_TARGET_HAS_not_vec; + } else if (def->flags & TCG_OPF_64BIT) { not_op = INDEX_op_not_i64; have_not = TCG_TARGET_HAS_not_i64; } else { @@ -810,16 +804,16 @@ void tcg_optimize(TCGContext *s) /* Simplify expression for "op r, a, const => mov r, a" cases */ switch (opc) { - CASE_OP_32_64(add): - CASE_OP_32_64(sub): + CASE_OP_32_64_VEC(add): + CASE_OP_32_64_VEC(sub): + CASE_OP_32_64_VEC(or): + CASE_OP_32_64_VEC(xor): + CASE_OP_32_64_VEC(andc): CASE_OP_32_64(shl): CASE_OP_32_64(shr): CASE_OP_32_64(sar): CASE_OP_32_64(rotl): CASE_OP_32_64(rotr): - CASE_OP_32_64(or): - CASE_OP_32_64(xor): - CASE_OP_32_64(andc): if (!arg_is_const(op->args[1]) && arg_is_const(op->args[2]) && arg_info(op->args[2])->val == 0) { @@ -827,8 +821,8 @@ void tcg_optimize(TCGContext *s) continue; } break; - CASE_OP_32_64(and): - CASE_OP_32_64(orc): + CASE_OP_32_64_VEC(and): + CASE_OP_32_64_VEC(orc): CASE_OP_32_64(eqv): if (!arg_is_const(op->args[1]) && arg_is_const(op->args[2]) @@ -1042,8 +1036,8 @@ void tcg_optimize(TCGContext *s) /* Simplify expression for "op r, a, 0 => movi r, 0" cases */ switch (opc) { - CASE_OP_32_64(and): - CASE_OP_32_64(mul): + CASE_OP_32_64_VEC(and): + CASE_OP_32_64_VEC(mul): CASE_OP_32_64(muluh): CASE_OP_32_64(mulsh): if (arg_is_const(op->args[2]) @@ -1058,8 +1052,8 @@ void tcg_optimize(TCGContext *s) /* Simplify expression for "op r, a, a => mov r, a" cases */ switch (opc) { - CASE_OP_32_64(or): - CASE_OP_32_64(and): + CASE_OP_32_64_VEC(or): + CASE_OP_32_64_VEC(and): if (args_are_copies(op->args[1], op->args[2])) { tcg_opt_gen_mov(s, op, op->args[0], op->args[1]); continue; @@ -1071,9 +1065,9 @@ void tcg_optimize(TCGContext *s) /* Simplify expression for "op r, a, a => movi r, 0" cases */ switch (opc) { - CASE_OP_32_64(andc): - CASE_OP_32_64(sub): - CASE_OP_32_64(xor): + CASE_OP_32_64_VEC(andc): + CASE_OP_32_64_VEC(sub): + CASE_OP_32_64_VEC(xor): if (args_are_copies(op->args[1], op->args[2])) { tcg_opt_gen_movi(s, op, op->args[0], 0); continue; @@ -1087,10 +1081,11 @@ void tcg_optimize(TCGContext *s) folding. Constants will be substituted to arguments by register allocator where needed and possible. Also detect copies. */ switch (opc) { - CASE_OP_32_64(mov): + CASE_OP_32_64_VEC(mov): tcg_opt_gen_mov(s, op, op->args[0], op->args[1]); break; CASE_OP_32_64(movi): + case INDEX_op_dupi_vec: tcg_opt_gen_movi(s, op, op->args[0], op->args[1]); break; From patchwork Sat Jan 6 03:13:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123605 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp123757qgn; Fri, 5 Jan 2018 19:17:46 -0800 (PST) X-Google-Smtp-Source: ACJfBotve/G1fk7JXETwO8z3DBvBUmBQSZlOn/01iAahodvOlBDh+BYLiTsuCQdTZGWQWathsBFz X-Received: by 10.37.118.67 with SMTP id r64mr4768183ybc.328.1515208666430; Fri, 05 Jan 2018 19:17:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515208666; cv=none; d=google.com; s=arc-20160816; b=hCUcyEgw2TAdeb1HX2/kF/jNOba7urHFjgWVjm1Pc26y7/CF2zjURU00oLFlA/argl wwL5t+dRwJOY/XGoIf7wyx+1fP+9wRh+ev4/gPm8nvvGsm/Jubg9WJ8fMO5aPjhodB8I JqIzvdVbPerJmM/v9lidxweqRok6+n7YyXXcrXJuFBXyrSqN6JSXKK81M7XJ9z9EdgeJ b8R4H7D3VTTcRmPSIEITdOVQXsnX60RQsxLo0YFDFo8bKdwHf8UxQlHnwTZYwbMkaSbz oHCSY8e59x/CSWpLgZA8N4oMleo8vfrMl3Wu2YAtkac5XYkfVFk17PwpexcEKNGIA7Cg V1Hg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=eg2BI0DBNuDqW21c7Zem30kPpi4s/mmmuiwc5wTxx90=; b=rbmboy6lW24ZtWghKRw9Q1SfJGpCXZvQsvxE1X2DeI6dd+iiMsVaSBtcF0OyuI1ird Z3hpGWkZBy8DsmDJU/cdj/gvUtzwcodBBHvOZ31hA7JRl3ZtIcjtdOVFuIIKcF9X4jmT p/W5K7+NrNyjY3v+BK+0ZrQAiBqla+FVbT/X6vv5Nw7CZseCYTtDoAlkWTPXSPh9SQeL OpLVVUdhPjes/PfdsocI1UFO/WiH6ZP3S7O1RIXlb4Mqz3GyT1N4Ja+qCp70+fUYujLc a0nnkOndubZrO31abh222djPRWYjEd3PG1KztgsLV6lKM8okxXvUsU4wZIA7wrDHymcw Ixlg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=fW5GkgHs; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id b8si1465965ybn.721.2018.01.05.19.17.46 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:17:46 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=fW5GkgHs; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44045 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXezR-0001an-Rp for patch@linaro.org; Fri, 05 Jan 2018 22:17:45 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48400) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXew3-0007co-Qc for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:16 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXevz-0005JM-S2 for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:15 -0500 Received: from mail-pg0-x241.google.com ([2607:f8b0:400e:c05::241]:37044) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXevz-0005Hm-Ma for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:11 -0500 Received: by mail-pg0-x241.google.com with SMTP id o13so2712999pgp.4 for ; Fri, 05 Jan 2018 19:14:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=eg2BI0DBNuDqW21c7Zem30kPpi4s/mmmuiwc5wTxx90=; b=fW5GkgHsa/2FnDisR15ydjsOT0hLpH3PIkU54igr2A9PzsZZIq77tZUwmDtK5NceHF eOXayQGMa9wDyGf+DVyEuaRlWwgUhYjJVh09/2OOCu6SXV4gxpz5Bn5kf3DxERyWMamP jDBc492TtfuhY6z08dy2QvQCWqV1B4unrRC1o= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=eg2BI0DBNuDqW21c7Zem30kPpi4s/mmmuiwc5wTxx90=; b=aGTZmY2AgNGk6Uv2YL/14X6OSaO4vaEhH7O4Dn83nj1c2urQSJ8UiTF3+4Bmx8quhd H4dgENu1Lw/ewu2uUKE4cLV4yRe4Y25mhzXViyEldmPSRGzLY3eLveOU6pU6JkGltGJS fu7l0xnu2D9WW0HTIOMDfzEKy+X++i3jSTT3HcPzHYL58GpO6d3RdBsPCNThNZYj9+Hl EN6KQtpy1MfaICyhs2SBWbpYDMyD5mkhGRwSlUmIKxoZXcnmDNS/bD2SVUR3cyBKrGXc cRaGuvnRzXWI86dPGjSanZDo1nfyba5tmlSQ4U59r9Zpm76ZescHv0Bhg2zVRYLG9IKc 55rw== X-Gm-Message-State: AKGB3mK6T4hog0zmgHMOHIC4aqgq3OdgJ5L7lu93xZ1gDrlvurNN1hQY kq7s8X+5M49O7I/bezov7WPnWW3UZ30= X-Received: by 10.101.82.131 with SMTP id y3mr4180326pgp.254.1515208450555; Fri, 05 Jan 2018 19:14:10 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.14.09 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:14:09 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:36 -0800 Message-Id: <20180106031346.6650-14-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::241 Subject: [Qemu-devel] [PATCH v8 13/23] target/arm: Align vector registers X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Alex Bennée Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- target/arm/cpu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.14.3 diff --git a/target/arm/cpu.h b/target/arm/cpu.h index 96316700dd..3ff4dea6b8 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -492,7 +492,7 @@ typedef struct CPUARMState { * the two execution states, and means we do not need to explicitly * map these registers when changing states. */ - float64 regs[64]; + float64 regs[64] QEMU_ALIGNED(16); uint32_t xregs[16]; /* We store these fpcsr fields separately for convenience. */ From patchwork Sat Jan 6 03:13:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123614 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp128759qgn; Fri, 5 Jan 2018 19:25:41 -0800 (PST) X-Google-Smtp-Source: ACJfBovi9gyrVCswl5JmwarxnbJPOPG9ux30w2AN3Ogk7NZnIui7+xsQpEEuayhF4tlhSjYUoD9/ X-Received: by 10.129.75.23 with SMTP id y23mr4519571ywa.98.1515209141660; Fri, 05 Jan 2018 19:25:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515209141; cv=none; d=google.com; s=arc-20160816; b=MjdWL/GUieA2UNRvx25MmY+suGjDWSigIEShTB36GenFVrI2j699qTKdcD7UyVQxZf 07kwusFcXzMp7xsvtHygEXQ3TIG3hcgzdou3reMkYoRRwXaXOWggr5F143PsfMk5npoX a8eEzed1p9k6JfMrbQNfiIIyi9fKId7aGpiPE7fBNocDsKcVM9ZK5YtB/wTWScOwVRyi cHXAwokN5tyRJbye7Vy6drtPHm6UUWW8XDh9i0xYoOSEuhl2z0LB7R0CyCMW9Ji/T6xn D/rz6r3x9LOY2jfXbmBh0UISo2HVy+e4qUkOVfRNZb1RAxTqNs3K8i050OH2xboSSpOB X+JA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=Ua176QlergBBCigQAYx4Bh/qGHxv4J7O/KNvAEHjYFo=; b=sESJHmYblbobM8XIEQpiPBoX31U0o6s7OFGRVI4kjKq/66AVVbXi3fis/Trftp+A2z 5Xx+p8hlcK7t/ciILlQvDIAn6nEzLGkDu8v43kJj21jWFF6vV+vXX6cz4hFOer+jzxQP qeG1vQc3oaf6pYSrSIeI0nJeuzHxjIzBZ7YzlDAlb7z5p5EIYst7XI+TddphKtJ5IBYy SzgYVMHh05Ij+CF8Yex2y5pn3D6XHMfknp/Ydd5nnKFIjhY24Ab9qK+SS4khNpwDCtMU PATvZstxF3BdHIawkeynF1AmW3zk8BXwD62UjOWvkJVWl0MXuXRD0aiarFvNmdRtc/Ju ZqmA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=L58s0wVU; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id l132si1451459ywc.172.2018.01.05.19.25.41 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:25:41 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=L58s0wVU; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44089 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXf77-0007s8-4O for patch@linaro.org; Fri, 05 Jan 2018 22:25:41 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48401) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXew3-0007cp-RD for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXew1-0005NO-KK for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:15 -0500 Received: from mail-pf0-x241.google.com ([2607:f8b0:400e:c00::241]:34346) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXew1-0005M0-Bq for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:13 -0500 Received: by mail-pf0-x241.google.com with SMTP id a90so2970442pfk.1 for ; Fri, 05 Jan 2018 19:14:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=Ua176QlergBBCigQAYx4Bh/qGHxv4J7O/KNvAEHjYFo=; b=L58s0wVUJMjLlvaogBo3lr9qI0Cbe4Ixep23AA494V24eGdnr+NUaXG2ZNaxt3vWYV 4NfwglkvevNPjA2w0cQ1Om1vGG1T2gcw9RwlFR3821HENZYSZM3phFQ+vst00PVjtqVd 1TuuMjJ18pD7rmaFABwKwvvgQOSZeee34Ha10= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Ua176QlergBBCigQAYx4Bh/qGHxv4J7O/KNvAEHjYFo=; b=eplsI7vtv9xIEINxNXpA5LiIBPONW4bVHZSLcZREhHTaOVecPe8MK8ssXYzjNTms7A VibckVIao+5xDI5IYnOpsQ5FjhjXHiMxUO/gxbXYIvLwlZIrfzp1lGY8Xx6vFpxjwcX/ SlHFffxvQCGwdGqJQCmfDqzITbgy4cihcg6NL2vaxurAND56ImhzO3r25eNfrdEu5iQm jDYyP1irTwnXC9kfR9aOOm2jZMNAkpilwkbFZxPVpy9CXAg+TMJbtIZlEqkx31d7aXJK DAfHdH1rQ+vGp0j6jyx0PV2QwGfcrRlNiu4vt8UWqvjtR9w5M7714LxDwy03si8pn/lO D1KA== X-Gm-Message-State: AKGB3mJbCsNQNZpiya4N67MWF15atssJh7uqOmdSY87aZ3QGBhc7bE8M Mam6aWg4SnXpGPZJWp9dshUWozksy2o= X-Received: by 10.99.106.138 with SMTP id f132mr4156214pgc.115.1515208452099; Fri, 05 Jan 2018 19:14:12 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.14.10 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:14:11 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:37 -0800 Message-Id: <20180106031346.6650-15-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::241 Subject: [Qemu-devel] [PATCH v8 14/23] target/arm: Use vector infrastructure for aa64 add/sub/logic X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Alex Bennée Signed-off-by: Richard Henderson --- target/arm/translate-a64.c | 207 +++++++++++++++++++++++++++++---------------- 1 file changed, 134 insertions(+), 73 deletions(-) -- 2.14.3 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index ba94f7d045..572af456d1 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -21,6 +21,7 @@ #include "cpu.h" #include "exec/exec-all.h" #include "tcg-op.h" +#include "tcg-op-gvec.h" #include "qemu/log.h" #include "arm_ldst.h" #include "translate.h" @@ -83,6 +84,10 @@ typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64); typedef void CryptoTwoOpEnvFn(TCGv_ptr, TCGv_i32, TCGv_i32); typedef void CryptoThreeOpEnvFn(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32); +/* Note that the gvec expanders operate on offsets + sizes. */ +typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t, + uint32_t, uint32_t, uint32_t); + /* initialize TCG globals. */ void a64_translate_init(void) { @@ -535,6 +540,21 @@ static inline int vec_reg_offset(DisasContext *s, int regno, return offs; } +/* Return the offset info CPUARMState of the "whole" vector register Qn. */ +static inline int vec_full_reg_offset(DisasContext *s, int regno) +{ + assert_fp_access_checked(s); + return offsetof(CPUARMState, vfp.regs[regno * 2]); +} + +/* Return the byte size of the "whole" vector register, VL / 8. */ +static inline int vec_full_reg_size(DisasContext *s) +{ + /* FIXME SVE: We should put the composite ZCR_EL* value into tb->flags. + In the meantime this is just the AdvSIMD length of 128. */ + return 128 / 8; +} + /* Return the offset into CPUARMState of a slice (from * the least significant end) of FP register Qn (ie * Dn, Sn, Hn or Bn). @@ -9048,85 +9068,125 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn) } } +static void gen_bsl_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm) +{ + tcg_gen_xor_i64(rn, rn, rm); + tcg_gen_and_i64(rn, rn, rd); + tcg_gen_xor_i64(rd, rm, rn); +} + +static void gen_bit_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm) +{ + tcg_gen_xor_i64(rn, rn, rd); + tcg_gen_and_i64(rn, rn, rm); + tcg_gen_xor_i64(rd, rd, rn); +} + +static void gen_bif_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm) +{ + tcg_gen_xor_i64(rn, rn, rd); + tcg_gen_andc_i64(rn, rn, rm); + tcg_gen_xor_i64(rd, rd, rn); +} + +static void gen_bsl_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm) +{ + tcg_gen_xor_vec(vece, rn, rn, rm); + tcg_gen_and_vec(vece, rn, rn, rd); + tcg_gen_xor_vec(vece, rd, rm, rn); +} + +static void gen_bit_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm) +{ + tcg_gen_xor_vec(vece, rn, rn, rd); + tcg_gen_and_vec(vece, rn, rn, rm); + tcg_gen_xor_vec(vece, rd, rd, rn); +} + +static void gen_bif_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm) +{ + tcg_gen_xor_vec(vece, rn, rn, rd); + tcg_gen_andc_vec(vece, rn, rn, rm); + tcg_gen_xor_vec(vece, rd, rd, rn); +} + /* Logic op (opcode == 3) subgroup of C3.6.16. */ static void disas_simd_3same_logic(DisasContext *s, uint32_t insn) { + static const GVecGen3 bsl_op = { + .fni8 = gen_bsl_i64, + .fniv = gen_bsl_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true + }; + static const GVecGen3 bit_op = { + .fni8 = gen_bit_i64, + .fniv = gen_bit_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true + }; + static const GVecGen3 bif_op = { + .fni8 = gen_bif_i64, + .fniv = gen_bif_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true + }; + int rd = extract32(insn, 0, 5); int rn = extract32(insn, 5, 5); int rm = extract32(insn, 16, 5); int size = extract32(insn, 22, 2); bool is_u = extract32(insn, 29, 1); bool is_q = extract32(insn, 30, 1); - TCGv_i64 tcg_op1, tcg_op2, tcg_res[2]; - int pass; + GVecGen3Fn *gvec_fn; + const GVecGen3 *gvec_op; if (!fp_access_check(s)) { return; } - tcg_op1 = tcg_temp_new_i64(); - tcg_op2 = tcg_temp_new_i64(); - tcg_res[0] = tcg_temp_new_i64(); - tcg_res[1] = tcg_temp_new_i64(); - - for (pass = 0; pass < (is_q ? 2 : 1); pass++) { - read_vec_element(s, tcg_op1, rn, pass, MO_64); - read_vec_element(s, tcg_op2, rm, pass, MO_64); - - if (!is_u) { - switch (size) { - case 0: /* AND */ - tcg_gen_and_i64(tcg_res[pass], tcg_op1, tcg_op2); - break; - case 1: /* BIC */ - tcg_gen_andc_i64(tcg_res[pass], tcg_op1, tcg_op2); - break; - case 2: /* ORR */ - tcg_gen_or_i64(tcg_res[pass], tcg_op1, tcg_op2); - break; - case 3: /* ORN */ - tcg_gen_orc_i64(tcg_res[pass], tcg_op1, tcg_op2); - break; - } - } else { - if (size != 0) { - /* B* ops need res loaded to operate on */ - read_vec_element(s, tcg_res[pass], rd, pass, MO_64); - } - - switch (size) { - case 0: /* EOR */ - tcg_gen_xor_i64(tcg_res[pass], tcg_op1, tcg_op2); - break; - case 1: /* BSL bitwise select */ - tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_op2); - tcg_gen_and_i64(tcg_op1, tcg_op1, tcg_res[pass]); - tcg_gen_xor_i64(tcg_res[pass], tcg_op2, tcg_op1); - break; - case 2: /* BIT, bitwise insert if true */ - tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_res[pass]); - tcg_gen_and_i64(tcg_op1, tcg_op1, tcg_op2); - tcg_gen_xor_i64(tcg_res[pass], tcg_res[pass], tcg_op1); - break; - case 3: /* BIF, bitwise insert if false */ - tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_res[pass]); - tcg_gen_andc_i64(tcg_op1, tcg_op1, tcg_op2); - tcg_gen_xor_i64(tcg_res[pass], tcg_res[pass], tcg_op1); - break; - } - } - } + switch (size + 4 * is_u) { + case 0: /* AND */ + gvec_fn = tcg_gen_gvec_and; + goto do_fn; + case 1: /* BIC */ + gvec_fn = tcg_gen_gvec_andc; + goto do_fn; + case 2: /* ORR */ + gvec_fn = tcg_gen_gvec_or; + goto do_fn; + case 3: /* ORN */ + gvec_fn = tcg_gen_gvec_orc; + goto do_fn; + case 4: /* EOR */ + gvec_fn = tcg_gen_gvec_xor; + goto do_fn; + do_fn: + gvec_fn(0, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s)); + return; + + case 5: /* BSL bitwise select */ + gvec_op = &bsl_op; + goto do_op; + case 6: /* BIT, bitwise insert if true */ + gvec_op = &bit_op; + goto do_op; + case 7: /* BIF, bitwise insert if false */ + gvec_op = &bif_op; + goto do_op; + do_op: + tcg_gen_gvec_3(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s), gvec_op); + return; - write_vec_element(s, tcg_res[0], rd, 0, MO_64); - if (!is_q) { - tcg_gen_movi_i64(tcg_res[1], 0); + default: + g_assert_not_reached(); } - write_vec_element(s, tcg_res[1], rd, 1, MO_64); - - tcg_temp_free_i64(tcg_op1); - tcg_temp_free_i64(tcg_op2); - tcg_temp_free_i64(tcg_res[0]); - tcg_temp_free_i64(tcg_res[1]); } /* Helper functions for 32 bit comparisons */ @@ -9387,6 +9447,7 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) int rn = extract32(insn, 5, 5); int rd = extract32(insn, 0, 5); int pass; + GVecGen3Fn *gvec_op; switch (opcode) { case 0x13: /* MUL, PMUL */ @@ -9426,6 +9487,16 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) return; } + switch (opcode) { + case 0x10: /* ADD, SUB */ + gvec_op = u ? tcg_gen_gvec_sub : tcg_gen_gvec_add; + gvec_op(size, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s)); + return; + } + if (size == 3) { assert(is_q); for (pass = 0; pass < 2; pass++) { @@ -9598,16 +9669,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) genfn = fns[size][u]; break; } - case 0x10: /* ADD, SUB */ - { - static NeonGenTwoOpFn * const fns[3][2] = { - { gen_helper_neon_add_u8, gen_helper_neon_sub_u8 }, - { gen_helper_neon_add_u16, gen_helper_neon_sub_u16 }, - { tcg_gen_add_i32, tcg_gen_sub_i32 }, - }; - genfn = fns[size][u]; - break; - } case 0x11: /* CMTST, CMEQ */ { static NeonGenTwoOpFn * const fns[3][2] = { From patchwork Sat Jan 6 03:13:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123609 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp126432qgn; Fri, 5 Jan 2018 19:21:51 -0800 (PST) X-Google-Smtp-Source: ACJfBotZrIcgkWOplG2UZhr1RWLYiSrVnKSfmDx8nVxmwHQsITrkUWS1mxmtAJF1DvNaTK+531n5 X-Received: by 10.129.62.1 with SMTP id l1mr4612000ywa.215.1515208911079; Fri, 05 Jan 2018 19:21:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515208911; cv=none; d=google.com; s=arc-20160816; b=y0oXwErg2KoI1v8mqzFQr6EIQeL8EuaDAj5pXTPAgwQge7ONs5j1G8v9NvyQm2MAdt Iuf9+8x21XHpFBIjt0RKo5CA0mVOF2plrgFxhac5jvkS1dgA5Hl2mDIGcaKyi6AsN8Mf BHfGAwdrtCA83XKIxFLkyt/pXmX9JffZl8Mz+WEFPQ9fdbtzlAScN5F2OQBe4S1xfeLk +ff77m1RfwC7H9l2TgwNotDlUld0JUtQZ9H14PBlzLFS+Hx3t4zhiV5Wj8Y3HPjOGfX8 2rOck/Gu0Y2nB0ogo/sbDobZW04G9u2h5T7Zcfj7FkW5dLGmSuwAToOqLWIN5LgEW+Sb 0e2Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=acxGEq2fNn/6enx58EagUxLLjzbVqjr4Z7gQAnNLh/k=; b=kyCgvBuuEuPgTl7QV5L4A9fnQ++3tNZCIePb/fTUPvRBtJtgwJ1KqFVn29tsyqZiut GB7QktEGfsg9T3WmCAQNeAmMOCNOW51gvpdv37CO+dOu6CTAeRtrFUKK5yYDjel0qQZQ i2c2iWcNg4E3inZW/58NUQbLBatTbBsL1FsTNaMyOOMnyHK5bD0L3XxOXL2nwGKQx5rU ZlSnCqQ2G4z5AX7+U3TXgudX89MpFmJFIEEHRnMqlI2LJ3Ybffl/G6M4u4OSRuYcLS4n Ia5zur3jPJvV7oDUzTh8eL582v0MeQTWwSo5UcoLd3rUdnGIy6CsxN/2u+w81XwDKWKj 4sTw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=cYsuxqIr; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id c25si1505064ybe.757.2018.01.05.19.21.51 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:21:51 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=cYsuxqIr; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44071 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXf3O-00056l-GP for patch@linaro.org; Fri, 05 Jan 2018 22:21:50 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48407) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXew4-0007d9-4d for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXew2-0005QZ-Tt for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:16 -0500 Received: from mail-pg0-x241.google.com ([2607:f8b0:400e:c05::241]:42485) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXew2-0005P7-NT for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:14 -0500 Received: by mail-pg0-x241.google.com with SMTP id q67so2709665pga.9 for ; Fri, 05 Jan 2018 19:14:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=acxGEq2fNn/6enx58EagUxLLjzbVqjr4Z7gQAnNLh/k=; b=cYsuxqIrHTkBZa6XyRkCEJWsdA+C0WolPFkteZNbd0FzYBa3w7UP53qovdbo0tymmn 4Zv0NXD5trqPSPEil2c828WEXIvTAPrw2i5iDff0YUpJ9CkeX4LW9SJKwF0nQnP2XLeF GiGqs44jpd33G0hUPmfd84RayVafxuf3QNB6E= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=acxGEq2fNn/6enx58EagUxLLjzbVqjr4Z7gQAnNLh/k=; b=A2F6e1Y1IY2TmX4nxb9SUJy1dzFI0VX2K1Trnw4bdwjJ/S8dPtc50M/Rf+PeQ5S876 86h9FDQ+RvLNhLcVgizve7Qbb4fQY90+apWNoKHoYYha9T4cW7j7y4fs8/UY3FHTzHAz 3Rc3DYKUDULfvuztdwHI87YgHF0Q1f1IjHaFVkN2U5DOBXKEx+JEH0AMX7r368a7Gwzo oM5TjR1TJaGKkxed2gNdvrcOD3YG/rx/9XWOvHKBo5aAw6uciQHvGahf5JldhoHXmIw/ OMVRKUJ6GtiuUhJVCV59x+Y1yA635DvKHr4uPi/UqHrC4LX5krzoZ/HrhHPeqGfxWbyO WV7A== X-Gm-Message-State: AKGB3mIPcIaviqpI3dhbTaxFyEu0RM5w2JD3zYm5iuqym3NUzONEH6ka QQUfWbIFfhtzS/Rds4qv5+tbjdOpn+o= X-Received: by 10.101.69.13 with SMTP id n13mr4097357pgq.91.1515208453482; Fri, 05 Jan 2018 19:14:13 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.14.12 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:14:12 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:38 -0800 Message-Id: <20180106031346.6650-16-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::241 Subject: [Qemu-devel] [PATCH v8 15/23] target/arm: Use vector infrastructure for aa64 mov/not/neg X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- target/arm/translate-a64.c | 43 ++++++++++++++++++++++++++++++++++++++----- 1 file changed, 38 insertions(+), 5 deletions(-) -- 2.14.3 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 572af456d1..bc14c28e71 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -85,6 +85,7 @@ typedef void CryptoTwoOpEnvFn(TCGv_ptr, TCGv_i32, TCGv_i32); typedef void CryptoThreeOpEnvFn(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32); /* Note that the gvec expanders operate on offsets + sizes. */ +typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t); typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t); @@ -4579,14 +4580,19 @@ static void handle_fp_1src_double(DisasContext *s, int opcode, int rd, int rn) TCGv_i64 tcg_op; TCGv_i64 tcg_res; + switch (opcode) { + case 0x0: /* FMOV */ + tcg_gen_gvec_mov(0, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + 8, vec_full_reg_size(s)); + return; + } + fpst = get_fpstatus_ptr(); tcg_op = read_fp_dreg(s, rn); tcg_res = tcg_temp_new_i64(); switch (opcode) { - case 0x0: /* FMOV */ - tcg_gen_mov_i64(tcg_res, tcg_op); - break; case 0x1: /* FABS */ gen_helper_vfp_absd(tcg_res, tcg_op); break; @@ -9153,6 +9159,12 @@ static void disas_simd_3same_logic(DisasContext *s, uint32_t insn) gvec_fn = tcg_gen_gvec_andc; goto do_fn; case 2: /* ORR */ + if (rn == rm) { /* MOV */ + tcg_gen_gvec_mov(0, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + is_q ? 16 : 8, vec_full_reg_size(s)); + return; + } gvec_fn = tcg_gen_gvec_or; goto do_fn; case 3: /* ORN */ @@ -10032,6 +10044,7 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) int rmode = -1; TCGv_i32 tcg_rmode; TCGv_ptr tcg_fpstatus; + GVecGen2Fn *gvec_fn; switch (opcode) { case 0x0: /* REV64, REV32 */ @@ -10040,8 +10053,7 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) return; case 0x5: /* CNT, NOT, RBIT */ if (u && size == 0) { - /* NOT: adjust size so we can use the 64-bits-at-a-time loop. */ - size = 3; + /* NOT */ break; } else if (u && size == 1) { /* RBIT */ @@ -10293,6 +10305,27 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) tcg_rmode = NULL; } + switch (opcode) { + case 0x5: + if (u && size == 0) { /* NOT */ + gvec_fn = tcg_gen_gvec_not; + goto do_fn; + } + break; + case 0xb: + if (u) { /* NEG */ + gvec_fn = tcg_gen_gvec_neg; + goto do_fn; + } + break; + + do_fn: + gvec_fn(size, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + is_q ? 16 : 8, vec_full_reg_size(s)); + return; + } + if (size == 3) { /* All 64-bit element operations can be shared with scalar 2misc */ int pass; From patchwork Sat Jan 6 03:13:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123613 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp128093qgn; Fri, 5 Jan 2018 19:24:31 -0800 (PST) X-Google-Smtp-Source: ACJfBouVm0jNUM1FEIo5OdkdxoAp1WgcD3ntfRHxc38gmWl4+U32oaEhXnz/x+6TXcV2UiARvH6I X-Received: by 10.37.133.72 with SMTP id f8mr4769827ybn.339.1515209071524; Fri, 05 Jan 2018 19:24:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515209071; cv=none; d=google.com; s=arc-20160816; b=tmh9XqFOxxP0ar5wafUAzleIC/xrb3UG/Tg9tCmgZYX2xCni6cUz8EHtn5vKFV9yAY 3mjzejrC9E6J6Mrw0/K5NjZnlKQlVlWKczAwJ4xFR0aSyRxEDbVp+aMHJql9SpAD+H4C PB9pgewyM+Bqj3DbO2+NGMEUoyvTw0/L31+CAOtXKYz1dYolVUyLFmXgW1+5iWHRlOnz swnJWIt6CA1NI/3JW0/lj0be/NBDzAvW2k+RkaWkcstKNY3+i0xb42euoOTEbzB8W2pQ xW0mXSNSqmE1DbOh4/+0BgK9BklP8GVpL9EbWtwkG+4YknNXRYGQrDTK6nb6TC0pR0bV Q7lQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=okhsN7PvAI2tSwB7zlFXSFL3pRM/854PTOSz2h1XJ54=; b=SnYJI7E+worNLf+7N834O1nEVrOtFEX2xlcuZjj6KWhh9t53J2m083Q82HFPiKLF/z gBh4JQgRW8YXBwN9/Db19HEtctWpSk8PAbRMm1GqydHTilxmaJMNxo8jLYFfdvvH6l3r gx3EcNAk2ilOLyMPeYo72UcYueFjE0iPsGaMgVUY5FpBNVwhLNyS3lKVaAiI35zZuUT6 d3JfJaFfvUkD30LPgKLswEuJyLakRYvXYunLhzzP34XLhl/EcZY/RTNnVmVrbOTdp/qx k3UeLZ0e8tclmhWzAZOPGKzGktY2LKH2Mg+TbVKEivMlguZQz2hS9sLAvfThB7pN5i4F aJcw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=VdRp3ORO; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id s67si1510629ywf.183.2018.01.05.19.24.31 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:24:31 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=VdRp3ORO; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44086 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXf5y-0007dJ-V8 for patch@linaro.org; Fri, 05 Jan 2018 22:24:30 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48426) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXew5-0007eQ-Ea for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:18 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXew4-0005UU-Di for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:17 -0500 Received: from mail-pf0-x241.google.com ([2607:f8b0:400e:c00::241]:35385) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXew4-0005SW-5W for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:16 -0500 Received: by mail-pf0-x241.google.com with SMTP id t12so138813pfg.2 for ; Fri, 05 Jan 2018 19:14:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=okhsN7PvAI2tSwB7zlFXSFL3pRM/854PTOSz2h1XJ54=; b=VdRp3OROsx/oHvGj2+JQ1PyFe5bGltXbkCjcSvb0ZS0Gi4XO4jpElo0igVCgQ8NPIa C1vhv7Yh67vhgQr3le3IuMNm3BrzXb0bVO9vSggcVfo5y7UHPB0WKHCh7oGMOPmA627E WMRBZPa1Xt2cvOWolaBtDW841gjlfaJnxjibY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=okhsN7PvAI2tSwB7zlFXSFL3pRM/854PTOSz2h1XJ54=; b=NMBR4/tCFZTsynKS+w71NlZAsBCsmDoc7Tzfuy7rsQJaZkFrADbfb/Kj69GOhdYmhy HZyJH3rnPmruEX9Cs10wHBqetdgC09hQYpTczbA9gBTN248olOdXFTEiN9mb2c3fDblc r8UYxsUucXFQxlqsKwRfzouEog/oVEzIPhim1BHsnC9UczNb4srkaWHBUmk8UGMbXA21 JRlw9YX0QoRUAOVpVTKQqAkjMo8uJZcGZVz/G0GrxeBvMZzQ87PILNVvgkFg15GAGY+T 3KwGm4VnHwwK0WbHxleEc5f0xFNtsynjJUC95QbS/7H6TqgNYS1Bn0nE3hgZOoDBD0Nr agkg== X-Gm-Message-State: AKGB3mKj2mGXygC4vEsI7KYXVYySpXjwoCdFZSx28Jt/rBXqYDCRStH4 RDPt8So8nR3z+IrBruiDEewGc9U0L+k= X-Received: by 10.99.104.200 with SMTP id d191mr661660pgc.98.1515208454935; Fri, 05 Jan 2018 19:14:14 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.14.13 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:14:14 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:39 -0800 Message-Id: <20180106031346.6650-17-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::241 Subject: [Qemu-devel] [PATCH v8 16/23] target/arm: Use vector infrastructure for aa64 dup/movi X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- target/arm/translate-a64.c | 83 +++++++++++++++++++--------------------------- 1 file changed, 34 insertions(+), 49 deletions(-) -- 2.14.3 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index bc14c28e71..55a4902fc2 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -5846,38 +5846,24 @@ static void disas_simd_across_lanes(DisasContext *s, uint32_t insn) * * size: encoded in imm5 (see ARM ARM LowestSetBit()) */ + static void handle_simd_dupe(DisasContext *s, int is_q, int rd, int rn, int imm5) { int size = ctz32(imm5); - int esize = 8 << size; - int elements = (is_q ? 128 : 64) / esize; - int index, i; - TCGv_i64 tmp; + int index = imm5 >> (size + 1); if (size > 3 || (size == 3 && !is_q)) { unallocated_encoding(s); return; } - if (!fp_access_check(s)) { return; } - index = imm5 >> (size + 1); - - tmp = tcg_temp_new_i64(); - read_vec_element(s, tmp, rn, index, size); - - for (i = 0; i < elements; i++) { - write_vec_element(s, tmp, rd, i, size); - } - - if (!is_q) { - clear_vec_high(s, rd); - } - - tcg_temp_free_i64(tmp); + tcg_gen_gvec_dup_mem(size, vec_full_reg_offset(s, rd), + vec_reg_offset(s, rn, index, size), + is_q ? 16 : 8, vec_full_reg_size(s)); } /* DUP (element, scalar) @@ -5926,9 +5912,7 @@ static void handle_simd_dupg(DisasContext *s, int is_q, int rd, int rn, int imm5) { int size = ctz32(imm5); - int esize = 8 << size; - int elements = (is_q ? 128 : 64)/esize; - int i = 0; + uint32_t dofs, oprsz, maxsz; if (size > 3 || ((size == 3) && !is_q)) { unallocated_encoding(s); @@ -5939,12 +5923,11 @@ static void handle_simd_dupg(DisasContext *s, int is_q, int rd, int rn, return; } - for (i = 0; i < elements; i++) { - write_vec_element(s, cpu_reg(s, rn), rd, i, size); - } - if (!is_q) { - clear_vec_high(s, rd); - } + dofs = vec_full_reg_offset(s, rd); + oprsz = is_q ? 16 : 8; + maxsz = vec_full_reg_size(s); + + tcg_gen_gvec_dup_i64(size, dofs, oprsz, maxsz, cpu_reg(s, rn)); } /* INS (Element) @@ -6135,7 +6118,6 @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn) bool is_neg = extract32(insn, 29, 1); bool is_q = extract32(insn, 30, 1); uint64_t imm = 0; - TCGv_i64 tcg_rd, tcg_imm; int i; if (o2 != 0 || ((cmode == 0xf) && is_neg && !is_q)) { @@ -6217,32 +6199,35 @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn) imm = ~imm; } - tcg_imm = tcg_const_i64(imm); - tcg_rd = new_tmp_a64(s); + if (!((cmode & 0x9) == 0x1 || (cmode & 0xd) == 0x9)) { + /* MOVI or MVNI, with MVNI negation handled above. */ + tcg_gen_gvec_dup64i(vec_full_reg_offset(s, rd), is_q ? 16 : 8, + vec_full_reg_size(s), imm); + } else { + TCGv_i64 tcg_imm = tcg_const_i64(imm); + TCGv_i64 tcg_rd = new_tmp_a64(s); - for (i = 0; i < 2; i++) { - int foffs = i ? fp_reg_hi_offset(s, rd) : fp_reg_offset(s, rd, MO_64); + for (i = 0; i < 2; i++) { + int foffs = vec_reg_offset(s, rd, i, MO_64); - if (i == 1 && !is_q) { - /* non-quad ops clear high half of vector */ - tcg_gen_movi_i64(tcg_rd, 0); - } else if ((cmode & 0x9) == 0x1 || (cmode & 0xd) == 0x9) { - tcg_gen_ld_i64(tcg_rd, cpu_env, foffs); - if (is_neg) { - /* AND (BIC) */ - tcg_gen_and_i64(tcg_rd, tcg_rd, tcg_imm); + if (i == 1 && !is_q) { + /* non-quad ops clear high half of vector */ + tcg_gen_movi_i64(tcg_rd, 0); } else { - /* ORR */ - tcg_gen_or_i64(tcg_rd, tcg_rd, tcg_imm); + tcg_gen_ld_i64(tcg_rd, cpu_env, foffs); + if (is_neg) { + /* AND (BIC) */ + tcg_gen_and_i64(tcg_rd, tcg_rd, tcg_imm); + } else { + /* ORR */ + tcg_gen_or_i64(tcg_rd, tcg_rd, tcg_imm); + } } - } else { - /* MOVI */ - tcg_gen_mov_i64(tcg_rd, tcg_imm); + tcg_gen_st_i64(tcg_rd, cpu_env, foffs); } - tcg_gen_st_i64(tcg_rd, cpu_env, foffs); - } - tcg_temp_free_i64(tcg_imm); + tcg_temp_free_i64(tcg_imm); + } } /* AdvSIMD scalar copy From patchwork Sat Jan 6 03:13:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123610 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp127138qgn; Fri, 5 Jan 2018 19:22:53 -0800 (PST) X-Google-Smtp-Source: ACJfBouvHX4E/Tw8Zqi0JvS/XWnFQF8Vxvp+kwr+8rZLuU7G5dqHKBryaZOLNtioDvub91WgHTwg X-Received: by 10.37.234.2 with SMTP id p2mr4627084ybd.251.1515208973612; Fri, 05 Jan 2018 19:22:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515208973; cv=none; d=google.com; s=arc-20160816; b=fOSbNSqUIR0hYFrO0IzftADbd0ugCrcfCq22R4TvrgC/3AxoUTrB1aIfieOgPiWEbr h5/xMfHuez+rsDBbgFwGLU2yk/UMAnBuLDABivgC+oBwq5366qGrC1nW8z6VfVpJjzZC UzoQgocn4Q8O4IN8a+nFxp9tXWroHSO8eEg6834fF5WelJqVZ6N/DDZLNF8S0v7yZHz4 yRUJiO/98MvxhYidlh025RHDdk+Qlw8UZha+dCNg/ZySuI/hgl36i9MHt/saqiMp0Ukl UCO3ch1e8uPKPAJqKW6ee+OB7B9cWS4jro2MpBxfX018h3U9xdX/8Tiq+UnRgLrvO+MY prpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=fglbqt0ZZi+UTUzzw2VU5fDhQFOJREmbY828m0tqWIs=; b=scAL2aliF5ryV2/m3rP0ueqyngxZP/Elih7K/91BU+EC0cYNiptrCmNjeM0B9py6wA Om8hXRAQ7KUYIiOKU6aHDHqHzeK2tN+YWK5CrlQw6NyD/rxgMzPterBo94P+I/2aw1CQ J/KaGmnZgPFlNt0TVX0NX/X2lGczj/na/gMCTQOfuHrdGnvAMmpcVhpjcwNRMqpVTUhV cwz1aCb5ZG8zpo14bL+T+AX/VGWIYBkG/tjSJEwVfY4BKQFC2auGfS+d/sdyb+uUvCHP 4JFIO7AY6JDkC9OOBFgT9vwNl7Y/d33N7r7qAHCLyW6PMMA/P0kf32oo8W6APEFtfAC2 ulyw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=Gmuv36bF; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id p10si1483541ywm.21.2018.01.05.19.22.53 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:22:53 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=Gmuv36bF; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44074 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXf4O-0005FQ-Oh for patch@linaro.org; Fri, 05 Jan 2018 22:22:52 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48438) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXew7-0007gE-3e for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:20 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXew5-0005Xy-SR for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:19 -0500 Received: from mail-pg0-x242.google.com ([2607:f8b0:400e:c05::242]:45387) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXew5-0005WL-K2 for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:17 -0500 Received: by mail-pg0-x242.google.com with SMTP id c194so1735616pga.12 for ; Fri, 05 Jan 2018 19:14:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=fglbqt0ZZi+UTUzzw2VU5fDhQFOJREmbY828m0tqWIs=; b=Gmuv36bFxczwpQfR6+K/ek9uS5qSfqNEkmeSP0PSuAlsrMAOp8A1xPbuH4zCsm9kLZ i0dlFyD1o7KAsXDCGQpOSrFGlQ3W7IImEubffOnucoD2IQ8muXfMQ39pTJXBNKh0SwHm CTauTALRQ1OJbTDKqYpulsj4HPR845jQcGOco= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=fglbqt0ZZi+UTUzzw2VU5fDhQFOJREmbY828m0tqWIs=; b=VILOTjSyn25DuyWDPAPhWcgJlMzi6+BbyS+Tp6C7uNkRssdkyVSyzblk+WCeGh90eT unmSjKNFsc9vFlUPqFvwbQba8cbvGyye/qHwskQa4cR8Y8CC/UZ1J+ZvvovFfvhQsQUo Kc8lMzfqwz61mow2x8DIngjbHbjoBcaSxBwxUxTXM5k+phsdZvjWPFlMEQjI2D1UlHiT S9+tXB4To6J0yzIZ6CIaBiraZ6t8P0g08n1tL4GrXTx2jGD4dY60bTYLc8GTAKXmUuxv IOx+V0bnR40fuqDbcYEBEDTyIIElCxcqX7riQavGZ4hamBg1srHmqaQ6ctKijvR4Cxvv AQ1w== X-Gm-Message-State: AKGB3mJ7cfqj3EfHFpf6mbKSa4ks4FV8gwV5by/AB9126F8fJMzkyqLJ 7c1F/NkLaTinZYEYQDkjJANGNuDE/Oc= X-Received: by 10.99.114.30 with SMTP id n30mr444841pgc.178.1515208456358; Fri, 05 Jan 2018 19:14:16 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.14.15 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:14:15 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:40 -0800 Message-Id: <20180106031346.6650-18-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::242 Subject: [Qemu-devel] [PATCH v8 17/23] target/arm: Use vector infrastructure for aa64 zip/uzp/trn/xtn X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- target/arm/translate-a64.c | 103 +++++++++++++++------------------------------ 1 file changed, 35 insertions(+), 68 deletions(-) -- 2.14.3 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 55a4902fc2..8769b4505a 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -5576,11 +5576,7 @@ static void disas_simd_zip_trn(DisasContext *s, uint32_t insn) int opcode = extract32(insn, 12, 2); bool part = extract32(insn, 14, 1); bool is_q = extract32(insn, 30, 1); - int esize = 8 << size; - int i, ofs; - int datasize = is_q ? 128 : 64; - int elements = datasize / esize; - TCGv_i64 tcg_res, tcg_resl, tcg_resh; + GVecGen3Fn *gvec_fn; if (opcode == 0 || (size == 3 && !is_q)) { unallocated_encoding(s); @@ -5591,60 +5587,24 @@ static void disas_simd_zip_trn(DisasContext *s, uint32_t insn) return; } - tcg_resl = tcg_const_i64(0); - tcg_resh = tcg_const_i64(0); - tcg_res = tcg_temp_new_i64(); - - for (i = 0; i < elements; i++) { - switch (opcode) { - case 1: /* UZP1/2 */ - { - int midpoint = elements / 2; - if (i < midpoint) { - read_vec_element(s, tcg_res, rn, 2 * i + part, size); - } else { - read_vec_element(s, tcg_res, rm, - 2 * (i - midpoint) + part, size); - } - break; - } - case 2: /* TRN1/2 */ - if (i & 1) { - read_vec_element(s, tcg_res, rm, (i & ~1) + part, size); - } else { - read_vec_element(s, tcg_res, rn, (i & ~1) + part, size); - } - break; - case 3: /* ZIP1/2 */ - { - int base = part * elements / 2; - if (i & 1) { - read_vec_element(s, tcg_res, rm, base + (i >> 1), size); - } else { - read_vec_element(s, tcg_res, rn, base + (i >> 1), size); - } - break; - } - default: - g_assert_not_reached(); - } - - ofs = i * esize; - if (ofs < 64) { - tcg_gen_shli_i64(tcg_res, tcg_res, ofs); - tcg_gen_or_i64(tcg_resl, tcg_resl, tcg_res); - } else { - tcg_gen_shli_i64(tcg_res, tcg_res, ofs - 64); - tcg_gen_or_i64(tcg_resh, tcg_resh, tcg_res); - } + switch (opcode) { + case 1: /* UZP1/2 */ + gvec_fn = part ? tcg_gen_gvec_uzpo : tcg_gen_gvec_uzpe; + break; + case 2: /* TRN1/2 */ + gvec_fn = part ? tcg_gen_gvec_trno : tcg_gen_gvec_trne; + break; + case 3: /* ZIP1/2 */ + gvec_fn = part ? tcg_gen_gvec_ziph : tcg_gen_gvec_zipl; + break; + default: + g_assert_not_reached(); } - tcg_temp_free_i64(tcg_res); - - write_vec_element(s, tcg_resl, rd, 0, MO_64); - tcg_temp_free_i64(tcg_resl); - write_vec_element(s, tcg_resh, rd, 1, MO_64); - tcg_temp_free_i64(tcg_resh); + gvec_fn(size, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s)); } static void do_minmaxop(DisasContext *s, TCGv_i32 tcg_elt1, TCGv_i32 tcg_elt2, @@ -7922,6 +7882,22 @@ static void handle_2misc_narrow(DisasContext *s, bool scalar, int destelt = is_q ? 2 : 0; int passes = scalar ? 1 : 2; + if (opcode == 0x12 && !u) { /* XTN, XTN2 */ + tcg_debug_assert(!scalar); + if (is_q) { /* XTN2 */ + tcg_gen_gvec_uzpe(size, vec_reg_offset(s, rd, 1, MO_64), + vec_reg_offset(s, rn, 0, MO_64), + vec_reg_offset(s, rn, 1, MO_64), + 8, vec_full_reg_size(s) - 8); + } else { + tcg_gen_gvec_uzpe(size, vec_reg_offset(s, rd, 0, MO_64), + vec_reg_offset(s, rn, 0, MO_64), + vec_reg_offset(s, rn, 1, MO_64), + 8, vec_full_reg_size(s)); + } + return; + } + if (scalar) { tcg_res[1] = tcg_const_i32(0); } @@ -7939,23 +7915,14 @@ static void handle_2misc_narrow(DisasContext *s, bool scalar, tcg_res[pass] = tcg_temp_new_i32(); switch (opcode) { - case 0x12: /* XTN, SQXTUN */ + case 0x12: /* , SQXTUN */ { - static NeonGenNarrowFn * const xtnfns[3] = { - gen_helper_neon_narrow_u8, - gen_helper_neon_narrow_u16, - tcg_gen_extrl_i64_i32, - }; static NeonGenNarrowEnvFn * const sqxtunfns[3] = { gen_helper_neon_unarrow_sat8, gen_helper_neon_unarrow_sat16, gen_helper_neon_unarrow_sat32, }; - if (u) { - genenvfn = sqxtunfns[size]; - } else { - genfn = xtnfns[size]; - } + genenvfn = sqxtunfns[size]; break; } case 0x14: /* SQXTN, UQXTN */ From patchwork Sat Jan 6 03:13:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123617 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp129669qgn; Fri, 5 Jan 2018 19:27:13 -0800 (PST) X-Google-Smtp-Source: ACJfBovr4jdyQjRMuFl86ExGfCuFxeQv/jDkX+lLoGXA6Km7pDAmXxYeq0iq+D9WKExXKM+eoW3D X-Received: by 10.129.72.144 with SMTP id v138mr4559202ywa.435.1515209233855; Fri, 05 Jan 2018 19:27:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515209233; cv=none; d=google.com; s=arc-20160816; b=IGv1VwMP9uP3Lh6FOafBi5xCrNmq2uqAvEmPRHNGA1tfvh0gp9zVZ1rVK/HiBwSmIV +AVSy+Q3QZAjQWpJrGK3cfmhmhFjqpJDBROgiwfRBsjiAgtzaiuC0CcABOGYr6aoTbMa k2yP/8a4f22AxBFUFRgQtjv19SwSDM/LCKwnLUB3ew9gu5+27M1nLBe4s5z/YdWL8Y9o U6ARQPdQTDcdAn7I8Asa4kUJ2aalTxGjGfA5MA+ziLS4o1CZ7brs7GTYwJeeYJCLqa4z HyORXjV8ENlPQGWRI0KEOBns1equ666qxrRJssSvRU0i3F0xjrD/X1EN2dajBWDCNBWq QzRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=J2t98NDKJ8I7okoECSqhVltubOYcoy+ef/vScpQSPqs=; b=Ui1YcdbCE7eBGebRa/Dim2x/NTlgrgReqla4QvjuVH1+ANiA04ih6GC/x767GLsVWr e8+uX3M6nPJX4Mvog0BWthofWWPbcQYEeKTKIR9km1jLG6/ufyMWf5QPuiE9Uqson7x6 m8jyaHS7G9rddsrNzgi+deL3G+NEscYbDBjddxz3Xzk/z1bE9akXn9tqvCnufUCadTJ4 OGwMJuX2JatrHOtXNHC88Hengroro9HxjD32BPnDNE1EbJM5BZkNUweWEct9RZrdMzuv 5GkP4LaSfPxTR6KvZw2Eyo2J2JGGCjQNr7fjMn9VYn0Z0Ah7glWmaIxNuNmn0njhpigW poSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=MABUZyvo; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id s15si1484351ybg.390.2018.01.05.19.27.13 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:27:13 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=MABUZyvo; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44105 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXf8b-0001Zn-8X for patch@linaro.org; Fri, 05 Jan 2018 22:27:13 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48451) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXew9-0007ie-4A for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:25 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXew7-0005ax-8f for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:21 -0500 Received: from mail-pg0-x244.google.com ([2607:f8b0:400e:c05::244]:34071) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXew7-0005ZS-0Q for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:19 -0500 Received: by mail-pg0-x244.google.com with SMTP id j4so2719228pgp.1 for ; Fri, 05 Jan 2018 19:14:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=J2t98NDKJ8I7okoECSqhVltubOYcoy+ef/vScpQSPqs=; b=MABUZyvocP+1ALPxcyScwdJrorxfqd1pi+jC0Mu3Aixwjge5R3bbE/vxqyzu7uBpM/ bUshaXuwnLQ0nKtE+FDYoDvqlnxw88Rqii/P4Jj1eeLhfrmZknlnSt3k9l+FrvU3vRSx ifQTZLGas4rL2gkAAU4bBSORcroT2nzA0rmLo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=J2t98NDKJ8I7okoECSqhVltubOYcoy+ef/vScpQSPqs=; b=hrI/tGCDGLeHufo3YsAPo/x8n9ZjxaP2tOJWF8PFDGC9Uk/WfLfbCUAHQWTYsTxonV WzU3Vt0UHskttCEqFiBuPWlZtubqudnReNdETLkFfXt/Kk+Z/JNnfUwsK4co9ApVoWkK Zt0RmBKG1pb9OCgPfUtCvQdGHh817ZdGtKMRhKDVQ7zsbgVO0moWUhqKgrgYoZpEHmw0 KECvZIozTW5AIQ7QDGxsu3681oowni2JviqtrWBPYQwSHTv2Yyl0azVx3eWXu7on36AC 9DfgsAWcMdoNiiyYmQ55QSFF6TMSnrI7rpdOK5UMt/zvK8Lxd6OaqJ6rsLqmoMtESmU4 oBjQ== X-Gm-Message-State: AKGB3mJ0YMCj0v4Qxav8TBOO/rKX+1zyZ9ttwfEf2sfcETn6talIcPWu 2UVtn1YD7fLRK0DzUvI3FcjYOUWmKfs= X-Received: by 10.101.74.193 with SMTP id c1mr4174540pgu.408.1515208457767; Fri, 05 Jan 2018 19:14:17 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.14.16 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:14:16 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:41 -0800 Message-Id: <20180106031346.6650-19-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::244 Subject: [Qemu-devel] [PATCH v8 18/23] target/arm: Use vector infrastructure for aa64 constant shifts X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- target/arm/translate-a64.c | 386 ++++++++++++++++++++++++++++++++++++++------- 1 file changed, 329 insertions(+), 57 deletions(-) -- 2.14.3 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 8769b4505a..d8bb3bbb25 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -6432,17 +6432,6 @@ static void handle_shri_with_rndacc(TCGv_i64 tcg_res, TCGv_i64 tcg_src, } } -/* Common SHL/SLI - Shift left with an optional insert */ -static void handle_shli_with_ins(TCGv_i64 tcg_res, TCGv_i64 tcg_src, - bool insert, int shift) -{ - if (insert) { /* SLI */ - tcg_gen_deposit_i64(tcg_res, tcg_res, tcg_src, shift, 64 - shift); - } else { /* SHL */ - tcg_gen_shli_i64(tcg_res, tcg_src, shift); - } -} - /* SRI: shift right with insert */ static void handle_shri_with_ins(TCGv_i64 tcg_res, TCGv_i64 tcg_src, int size, int shift) @@ -6546,7 +6535,11 @@ static void handle_scalar_simd_shli(DisasContext *s, bool insert, tcg_rn = read_fp_dreg(s, rn); tcg_rd = insert ? read_fp_dreg(s, rd) : tcg_temp_new_i64(); - handle_shli_with_ins(tcg_rd, tcg_rn, insert, shift); + if (insert) { + tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_rn, shift, 64 - shift); + } else { + tcg_gen_shli_i64(tcg_rd, tcg_rn, shift); + } write_fp_dreg(s, rd, tcg_rd); @@ -8283,16 +8276,195 @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) } } +static void gen_ssra8_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift) +{ + tcg_gen_vec_sar8i_i64(a, a, shift); + tcg_gen_vec_add8_i64(d, d, a); +} + +static void gen_ssra16_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift) +{ + tcg_gen_vec_sar16i_i64(a, a, shift); + tcg_gen_vec_add16_i64(d, d, a); +} + +static void gen_ssra32_i32(TCGv_i32 d, TCGv_i32 a, int32_t shift) +{ + tcg_gen_sari_i32(a, a, shift); + tcg_gen_add_i32(d, d, a); +} + +static void gen_ssra64_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift) +{ + tcg_gen_sari_i64(a, a, shift); + tcg_gen_add_i64(d, d, a); +} + +static void gen_ssra_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh) +{ + tcg_gen_sari_vec(vece, a, a, sh); + tcg_gen_add_vec(vece, d, d, a); +} + +static void gen_usra8_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift) +{ + tcg_gen_vec_shr8i_i64(a, a, shift); + tcg_gen_vec_add8_i64(d, d, a); +} + +static void gen_usra16_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift) +{ + tcg_gen_vec_shr16i_i64(a, a, shift); + tcg_gen_vec_add16_i64(d, d, a); +} + +static void gen_usra32_i32(TCGv_i32 d, TCGv_i32 a, int32_t shift) +{ + tcg_gen_shri_i32(a, a, shift); + tcg_gen_add_i32(d, d, a); +} + +static void gen_usra64_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift) +{ + tcg_gen_shri_i64(a, a, shift); + tcg_gen_add_i64(d, d, a); +} + +static void gen_usra_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh) +{ + tcg_gen_shri_vec(vece, a, a, sh); + tcg_gen_add_vec(vece, d, d, a); +} + +static void gen_shr8_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift) +{ + uint64_t mask = (0xff >> shift) * (-1ull / 0xff); + TCGv_i64 t = tcg_temp_new_i64(); + + tcg_gen_shri_i64(t, a, shift); + tcg_gen_andi_i64(t, t, mask); + tcg_gen_andi_i64(d, d, ~mask); + tcg_gen_or_i64(d, d, t); + tcg_temp_free_i64(t); +} + +static void gen_shr16_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift) +{ + uint64_t mask = (0xffff >> shift) * (-1ull / 0xffff); + TCGv_i64 t = tcg_temp_new_i64(); + + tcg_gen_shri_i64(t, a, shift); + tcg_gen_andi_i64(t, t, mask); + tcg_gen_andi_i64(d, d, ~mask); + tcg_gen_or_i64(d, d, t); + tcg_temp_free_i64(t); +} + +static void gen_shr32_ins_i32(TCGv_i32 d, TCGv_i32 a, int32_t shift) +{ + tcg_gen_shri_i32(a, a, shift); + tcg_gen_deposit_i32(d, d, a, 0, 32 - shift); +} + +static void gen_shr64_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift) +{ + tcg_gen_shri_i64(a, a, shift); + tcg_gen_deposit_i64(d, d, a, 0, 64 - shift); +} + +static void gen_shr_ins_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh) +{ + uint64_t mask = (2ull << ((8 << vece) - 1)) - 1; + TCGv_vec t = tcg_temp_new_vec_matching(d); + TCGv_vec m = tcg_temp_new_vec_matching(d); + + tcg_gen_dupi_vec(vece, m, mask ^ (mask >> sh)); + tcg_gen_shri_vec(vece, t, a, sh); + tcg_gen_and_vec(vece, d, d, m); + tcg_gen_or_vec(vece, d, d, t); + + tcg_temp_free_vec(t); + tcg_temp_free_vec(m); +} + /* SSHR[RA]/USHR[RA] - Vector shift right (optional rounding/accumulate) */ static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u, int immh, int immb, int opcode, int rn, int rd) { + static const GVecGen2i ssra_op[4] = { + { .fni8 = gen_ssra8_i64, + .fniv = gen_ssra_vec, + .load_dest = true, + .opc = INDEX_op_sari_vec, + .vece = MO_8 }, + { .fni8 = gen_ssra16_i64, + .fniv = gen_ssra_vec, + .load_dest = true, + .opc = INDEX_op_sari_vec, + .vece = MO_16 }, + { .fni4 = gen_ssra32_i32, + .fniv = gen_ssra_vec, + .load_dest = true, + .opc = INDEX_op_sari_vec, + .vece = MO_32 }, + { .fni8 = gen_ssra64_i64, + .fniv = gen_ssra_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true, + .opc = INDEX_op_sari_vec, + .vece = MO_64 }, + }; + static const GVecGen2i usra_op[4] = { + { .fni8 = gen_usra8_i64, + .fniv = gen_usra_vec, + .load_dest = true, + .opc = INDEX_op_shri_vec, + .vece = MO_8, }, + { .fni8 = gen_usra16_i64, + .fniv = gen_usra_vec, + .load_dest = true, + .opc = INDEX_op_shri_vec, + .vece = MO_16, }, + { .fni4 = gen_usra32_i32, + .fniv = gen_usra_vec, + .load_dest = true, + .opc = INDEX_op_shri_vec, + .vece = MO_32, }, + { .fni8 = gen_usra64_i64, + .fniv = gen_usra_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true, + .opc = INDEX_op_shri_vec, + .vece = MO_64, }, + }; + static const GVecGen2i sri_op[4] = { + { .fni8 = gen_shr8_ins_i64, + .fniv = gen_shr_ins_vec, + .load_dest = true, + .opc = INDEX_op_shri_vec, + .vece = MO_8 }, + { .fni8 = gen_shr16_ins_i64, + .fniv = gen_shr_ins_vec, + .load_dest = true, + .opc = INDEX_op_shri_vec, + .vece = MO_16 }, + { .fni4 = gen_shr32_ins_i32, + .fniv = gen_shr_ins_vec, + .load_dest = true, + .opc = INDEX_op_shri_vec, + .vece = MO_32 }, + { .fni8 = gen_shr64_ins_i64, + .fniv = gen_shr_ins_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true, + .opc = INDEX_op_shri_vec, + .vece = MO_64 }, + }; + int size = 32 - clz32(immh) - 1; int immhb = immh << 3 | immb; int shift = 2 * (8 << size) - immhb; bool accumulate = false; - bool round = false; - bool insert = false; int dsize = is_q ? 128 : 64; int esize = 8 << size; int elements = dsize/esize; @@ -8300,6 +8472,8 @@ static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u, TCGv_i64 tcg_rn = new_tmp_a64(s); TCGv_i64 tcg_rd = new_tmp_a64(s); TCGv_i64 tcg_round; + uint64_t round_const; + const GVecGen2i *gvec_op; int i; if (extract32(immh, 3, 1) && !is_q) { @@ -8318,64 +8492,141 @@ static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u, switch (opcode) { case 0x02: /* SSRA / USRA (accumulate) */ - accumulate = true; - break; + if (is_u) { + /* Shift count same as element size produces zero to add. */ + if (shift == 8 << size) { + goto done; + } + gvec_op = &usra_op[size]; + } else { + /* Shift count same as element size produces all sign to add. */ + if (shift == 8 << size) { + shift -= 1; + } + gvec_op = &ssra_op[size]; + } + goto do_gvec; + case 0x08: /* SRI */ + /* Shift count same as element size is valid but does nothing. */ + if (shift == 8 << size) { + goto done; + } + gvec_op = &sri_op[size]; + do_gvec: + tcg_gen_gvec_2i(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), is_q ? 16 : 8, + vec_full_reg_size(s), shift, gvec_op); + return; + + case 0x00: /* SSHR / USHR */ + if (is_u) { + if (shift == 8 << size) { + /* Shift count the same size as element size produces zero. */ + tcg_gen_gvec_dup8i(vec_full_reg_offset(s, rd), + is_q ? 16 : 8, vec_full_reg_size(s), 0); + } else { + tcg_gen_gvec_shri(size, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), is_q ? 16 : 8, + vec_full_reg_size(s), shift); + } + } else { + /* Shift count the same size as element size produces all sign. */ + if (shift == 8 << size) { + shift -= 1; + } + tcg_gen_gvec_sari(size, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), is_q ? 16 : 8, + vec_full_reg_size(s), shift); + } + return; + case 0x04: /* SRSHR / URSHR (rounding) */ - round = true; break; case 0x06: /* SRSRA / URSRA (accum + rounding) */ - accumulate = round = true; - break; - case 0x08: /* SRI */ - insert = true; + accumulate = true; break; + default: + g_assert_not_reached(); } - if (round) { - uint64_t round_const = 1ULL << (shift - 1); - tcg_round = tcg_const_i64(round_const); - } else { - tcg_round = NULL; - } + round_const = 1ULL << (shift - 1); + tcg_round = tcg_const_i64(round_const); for (i = 0; i < elements; i++) { read_vec_element(s, tcg_rn, rn, i, memop); - if (accumulate || insert) { + if (accumulate) { read_vec_element(s, tcg_rd, rd, i, memop); } - if (insert) { - handle_shri_with_ins(tcg_rd, tcg_rn, size, shift); - } else { - handle_shri_with_rndacc(tcg_rd, tcg_rn, tcg_round, - accumulate, is_u, size, shift); - } + handle_shri_with_rndacc(tcg_rd, tcg_rn, tcg_round, + accumulate, is_u, size, shift); write_vec_element(s, tcg_rd, rd, i, size); } + tcg_temp_free_i64(tcg_round); + done: if (!is_q) { clear_vec_high(s, rd); } +} - if (round) { - tcg_temp_free_i64(tcg_round); - } +static void gen_shl8_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift) +{ + uint64_t mask = ((0xff << shift) & 0xff) * (-1ull / 0xff); + TCGv_i64 t = tcg_temp_new_i64(); + + tcg_gen_shli_i64(t, a, shift); + tcg_gen_andi_i64(t, t, mask); + tcg_gen_andi_i64(d, d, ~mask); + tcg_gen_or_i64(d, d, t); + tcg_temp_free_i64(t); +} + +static void gen_shl16_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift) +{ + uint64_t mask = ((0xffff << shift) & 0xffff) * (-1ull / 0xffff); + TCGv_i64 t = tcg_temp_new_i64(); + + tcg_gen_shli_i64(t, a, shift); + tcg_gen_andi_i64(t, t, mask); + tcg_gen_andi_i64(d, d, ~mask); + tcg_gen_or_i64(d, d, t); + tcg_temp_free_i64(t); +} + +static void gen_shl32_ins_i32(TCGv_i32 d, TCGv_i32 a, int32_t shift) +{ + tcg_gen_deposit_i32(d, d, a, shift, 32 - shift); +} + +static void gen_shl64_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift) +{ + tcg_gen_deposit_i64(d, d, a, shift, 64 - shift); +} + +static void gen_shl_ins_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh) +{ + uint64_t mask = (1ull << sh) - 1; + TCGv_vec t = tcg_temp_new_vec_matching(d); + TCGv_vec m = tcg_temp_new_vec_matching(d); + + tcg_gen_dupi_vec(vece, m, mask); + tcg_gen_shli_vec(vece, t, a, sh); + tcg_gen_and_vec(vece, d, d, m); + tcg_gen_or_vec(vece, d, d, t); + + tcg_temp_free_vec(t); + tcg_temp_free_vec(m); } /* SHL/SLI - Vector shift left */ static void handle_vec_simd_shli(DisasContext *s, bool is_q, bool insert, - int immh, int immb, int opcode, int rn, int rd) + int immh, int immb, int opcode, int rn, int rd) { int size = 32 - clz32(immh) - 1; int immhb = immh << 3 | immb; int shift = immhb - (8 << size); - int dsize = is_q ? 128 : 64; - int esize = 8 << size; - int elements = dsize/esize; - TCGv_i64 tcg_rn = new_tmp_a64(s); - TCGv_i64 tcg_rd = new_tmp_a64(s); - int i; if (extract32(immh, 3, 1) && !is_q) { unallocated_encoding(s); @@ -8391,19 +8642,40 @@ static void handle_vec_simd_shli(DisasContext *s, bool is_q, bool insert, return; } - for (i = 0; i < elements; i++) { - read_vec_element(s, tcg_rn, rn, i, size); - if (insert) { - read_vec_element(s, tcg_rd, rd, i, size); - } - - handle_shli_with_ins(tcg_rd, tcg_rn, insert, shift); - - write_vec_element(s, tcg_rd, rd, i, size); - } - - if (!is_q) { - clear_vec_high(s, rd); + if (insert) { + static const GVecGen2i shi_op[4] = { + { .fni8 = gen_shl8_ins_i64, + .fniv = gen_shl_ins_vec, + .opc = INDEX_op_shli_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true, + .vece = MO_8 }, + { .fni8 = gen_shl16_ins_i64, + .fniv = gen_shl_ins_vec, + .opc = INDEX_op_shli_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true, + .vece = MO_16 }, + { .fni4 = gen_shl32_ins_i32, + .fniv = gen_shl_ins_vec, + .opc = INDEX_op_shli_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true, + .vece = MO_32 }, + { .fni8 = gen_shl64_ins_i64, + .fniv = gen_shl_ins_vec, + .opc = INDEX_op_shli_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true, + .vece = MO_64 }, + }; + tcg_gen_gvec_2i(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), is_q ? 16 : 8, + vec_full_reg_size(s), shift, &shi_op[size]); + } else { + tcg_gen_gvec_shli(size, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), is_q ? 16 : 8, + vec_full_reg_size(s), shift); } } From patchwork Sat Jan 6 03:13:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123616 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp129052qgn; Fri, 5 Jan 2018 19:26:11 -0800 (PST) X-Google-Smtp-Source: ACJfBotBzyvq2xViG06PVZq9Qfva5Mj6N+XpNSWu1GkmTya/d1s/sAfOdwe/MR2/evkdMlmpxjse X-Received: by 10.129.172.28 with SMTP id k28mr4711148ywh.274.1515209171318; Fri, 05 Jan 2018 19:26:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515209171; cv=none; d=google.com; s=arc-20160816; b=MByTUSoiDN7Nkhu7txBW2cw6HTgnDCgUEqb/SNFmdpXIMu+0CrFzmb1WsPu5Sgmu9L vr+H14L8vfM7eEthck67ErS/OG1uuLcjjDKYNxu0FFdWrCbS9xYwk7xB/mrS5pmmtNs1 wI0F3cHx5L2gKyhwGSRTJuvIKhNDv/w+ASaCizy5CCWVGAeajR+QTdGZgrm137xwhGqu h7SEh6Idi7qkxfc28QqsIxpYVXEEJXP4wRqOSj+sUtLvfLVCuxgOvxSTcx2oD6WSMg6p Gmo8tTFIxWDBPGzTyyhnaNmfACR9E3TTTDp5YJR+L/0jeytE/ISicjtpxPBeCXnZOEFj ypNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=gzhPOh2O6PV97dzBZo/zSpmop9UKLGT2Pv0pLxPSWwQ=; b=biQXrM+0nwiRlivBs11RIMuNsr2+FTL0Uax7F5x6qvpk/fHdhKQZssxlVlQEJp/PmN S6ao4DsUqavYPq93iElXz9bjvZPLSlJul6Bb1OikyzMnQIXI6clZbgA2aTan9NONhGNg PIb15HnVVD3tWMMw+Qi8jYZJHbLEFXmIfK8h+zREb+XcTOvIZh5dsVrA6QfmZJ1Idzs1 WiSIJQEHRvAbJtyK84c2/10Rf8LT+mA68F1FS/b6EzT7N88SJl2HbqVnmhYKPqg28i68 A0/gsuAj/EL6zhP7ECzgPhhTeicWpye7Hi6008eBOyfrNRZQJvH3bo6P1oy+v24h/Ydd n85w== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=Rt1E82bR; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id h124si1495842ywc.474.2018.01.05.19.26.11 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:26:11 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=Rt1E82bR; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44102 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXf7a-0000hL-P4 for patch@linaro.org; Fri, 05 Jan 2018 22:26:10 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48479) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXewD-0007mS-4d for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:28 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXew9-0005fN-3m for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:25 -0500 Received: from mail-pg0-x243.google.com ([2607:f8b0:400e:c05::243]:45388) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXew8-0005dS-S7 for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:21 -0500 Received: by mail-pg0-x243.google.com with SMTP id c194so1735655pga.12 for ; Fri, 05 Jan 2018 19:14:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=gzhPOh2O6PV97dzBZo/zSpmop9UKLGT2Pv0pLxPSWwQ=; b=Rt1E82bRPHZMjQx6BOHZOv5FuHiRAQdNmN0Y9Wkytg3Vv+IvFTH1pATNskIJeuwe/B VCOEvbf2CjTXYL8Me67hrp54nzbl7kKqz7S6TbstnidJqPtcxdTL44e0LB1bLegyuybL 9NXb08VP3m6mdAF1sJmx9WBFeGstwnR7loEo8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=gzhPOh2O6PV97dzBZo/zSpmop9UKLGT2Pv0pLxPSWwQ=; b=QusouDzcqILegSPMS7iPHxaJvz/ap+P9VnyRX1PkQePZxNFuAF1bSC+GjaQVgO/FC8 L1Biu27wtYXBpwh7cPBIHkyYkC0wRLihgSd0cDsOv/Ac61ABdp5pZ37tY0WHY15s9RFE X9WDVPss5TiFsrIUCQIXsu5xR/WMzG0B6QK4qQseC0xaxsyL2G/EtrvQsg1OzEVQ5umN OTDRgPoxEYGP4eaYj1qTcc6Cjl5/dCYe6+BIHqR9yLyoEga0tl8q4OxFcnZLs3cxGIVr S/ILfaxccJ8g5oPT8rzoFIL+0vYN6fHusHfxVezz4x46/rfSQqs4ZH6ayFwmkc8w4UTR t1Gw== X-Gm-Message-State: AKGB3mKbH2VTplJCfcCI1tL9PbrEIF3QF2Lgd39DaDRauI8k6aR6Paw2 42i61p0yd6rIHlkGx7UZ91GmV4hPTn0= X-Received: by 10.99.179.10 with SMTP id i10mr4109683pgf.41.1515208459591; Fri, 05 Jan 2018 19:14:19 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.14.18 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:14:18 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:42 -0800 Message-Id: <20180106031346.6650-20-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::243 Subject: [Qemu-devel] [PATCH v8 19/23] target/arm: Use vector infrastructure for aa64 compares X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- target/arm/translate-a64.c | 96 ++++++++++++++++++++++++++++++---------------- 1 file changed, 62 insertions(+), 34 deletions(-) -- 2.14.3 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index d8bb3bbb25..44e44cc9f2 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -7115,6 +7115,28 @@ static void disas_simd_scalar_three_reg_diff(DisasContext *s, uint32_t insn) } } +/* CMTST : test is "if (X & Y != 0)". */ +static void gen_cmtst_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b) +{ + tcg_gen_and_i32(d, a, b); + tcg_gen_setcondi_i32(TCG_COND_NE, d, d, 0); + tcg_gen_neg_i32(d, d); +} + +static void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + tcg_gen_and_i64(d, a, b); + tcg_gen_setcondi_i64(TCG_COND_NE, d, d, 0); + tcg_gen_neg_i64(d, d); +} + +static void gen_cmtst_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_and_vec(vece, d, a, b); + tcg_gen_dupi_vec(vece, a, 0); + tcg_gen_cmp_vec(TCG_COND_NE, vece, d, d, a); +} + static void handle_3same_64(DisasContext *s, int opcode, bool u, TCGv_i64 tcg_rd, TCGv_i64 tcg_rn, TCGv_i64 tcg_rm) { @@ -7158,10 +7180,7 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u, cond = TCG_COND_EQ; goto do_cmop; } - /* CMTST : test is "if (X & Y != 0)". */ - tcg_gen_and_i64(tcg_rd, tcg_rn, tcg_rm); - tcg_gen_setcondi_i64(TCG_COND_NE, tcg_rd, tcg_rd, 0); - tcg_gen_neg_i64(tcg_rd, tcg_rd); + gen_cmtst_i64(tcg_rd, tcg_rn, tcg_rm); break; case 0x8: /* SSHL, USHL */ if (u) { @@ -9684,6 +9703,7 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) int rd = extract32(insn, 0, 5); int pass; GVecGen3Fn *gvec_op; + TCGCond cond; switch (opcode) { case 0x13: /* MUL, PMUL */ @@ -9731,6 +9751,44 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) vec_full_reg_offset(s, rm), is_q ? 16 : 8, vec_full_reg_size(s)); return; + case 0x11: + if (u) { /* CMEQ */ + cond = TCG_COND_EQ; + goto do_gvec_cmp; + } else { /* CMTST */ + static const GVecGen3 cmtst_op[4] = { + { .fni4 = gen_helper_neon_tst_u8, + .fniv = gen_cmtst_vec, + .vece = MO_8 }, + { .fni4 = gen_helper_neon_tst_u16, + .fniv = gen_cmtst_vec, + .vece = MO_16 }, + { .fni4 = gen_cmtst_i32, + .fniv = gen_cmtst_vec, + .vece = MO_32 }, + { .fni8 = gen_cmtst_i64, + .fniv = gen_cmtst_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_64 }, + }; + tcg_gen_gvec_3(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s), + &cmtst_op[size]); + } + return; + case 0x06: /* CMGT, CMHI */ + cond = u ? TCG_COND_GTU : TCG_COND_GT; + goto do_gvec_cmp; + case 0x07: /* CMGE, CMHS */ + cond = u ? TCG_COND_GEU : TCG_COND_GE; + do_gvec_cmp: + tcg_gen_gvec_cmp(cond, size, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s)); + return; } if (size == 3) { @@ -9813,26 +9871,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) genenvfn = fns[size][u]; break; } - case 0x6: /* CMGT, CMHI */ - { - static NeonGenTwoOpFn * const fns[3][2] = { - { gen_helper_neon_cgt_s8, gen_helper_neon_cgt_u8 }, - { gen_helper_neon_cgt_s16, gen_helper_neon_cgt_u16 }, - { gen_helper_neon_cgt_s32, gen_helper_neon_cgt_u32 }, - }; - genfn = fns[size][u]; - break; - } - case 0x7: /* CMGE, CMHS */ - { - static NeonGenTwoOpFn * const fns[3][2] = { - { gen_helper_neon_cge_s8, gen_helper_neon_cge_u8 }, - { gen_helper_neon_cge_s16, gen_helper_neon_cge_u16 }, - { gen_helper_neon_cge_s32, gen_helper_neon_cge_u32 }, - }; - genfn = fns[size][u]; - break; - } case 0x8: /* SSHL, USHL */ { static NeonGenTwoOpFn * const fns[3][2] = { @@ -9905,16 +9943,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) genfn = fns[size][u]; break; } - case 0x11: /* CMTST, CMEQ */ - { - static NeonGenTwoOpFn * const fns[3][2] = { - { gen_helper_neon_tst_u8, gen_helper_neon_ceq_u8 }, - { gen_helper_neon_tst_u16, gen_helper_neon_ceq_u16 }, - { gen_helper_neon_tst_u32, gen_helper_neon_ceq_u32 }, - }; - genfn = fns[size][u]; - break; - } case 0x13: /* MUL, PMUL */ if (u) { /* PMUL */ From patchwork Sat Jan 6 03:13:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123612 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp127436qgn; Fri, 5 Jan 2018 19:23:22 -0800 (PST) X-Google-Smtp-Source: ACJfBouN55CeCYprzY1hbLKEb3SM2FSAQj2sQRebwGn1TubLkR3CwuWRuE4VHpWl4bGvKmQ60yjQ X-Received: by 10.37.109.4 with SMTP id i4mr927361ybc.475.1515209002455; Fri, 05 Jan 2018 19:23:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515209002; cv=none; d=google.com; s=arc-20160816; b=s4jZle4NojhTm/ucTvq+b6/rQgubpMJSL0/huRlzQXOZQYDX9i/fKw9ZYsDRgYCZes BSaU/7rx13bLQQmFFnO5+8jD1//KtdLqS5whneMj60rXU10K0HvUmyg2Oz7qPDUwssyy 8ku2TvelAn4zMqoxOpBA3qwKYVvKOpZENeY+G72CrQ7pFOtJpdj5PdeWcTqgSMdNkiu3 Kkym5jt492QVTSSKo2WilyaM9tqnjQVZxvtX8++u+PrcTJ/6PV6AepbvkZ0tMmbONgQb GwPxdaUyQxZ32UuUp3EAna3EW+hh7N4B5u4MTb/iYSpU5b9hHY8AqusrLoT0kxnQiP8G 7ygg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=00hjtn7Jp4MNhaFa8SlxmgesWt1Jh7844uLgFJ4kNSk=; b=hw6ldmVmnDymP3MQz/WzG6pR1HrUi5Lb3vDqcS3WYxWIdANru5jMigz+Ry9lV8Nudv 8PzhCI0DbtdrjaR6PGTeKWsacc3ZjwDNhxXmksC6WokH2578plQE4VufFRDncmdDeETh ZPOB5Lnlel4y8AaArkz64NtjrwftDrnUbYFCtrju/lR4yhH0Pm6Bpeq66PTZC+2F+y1Q qNmVAaz0MZAj0lhTaOnDlt5GlHLqMzdeN78P23IFFy4EOkeLjCmCUs8lPqBD6F7PynMz xZv1Jk82LvI/+zspVygEsG7R9ZbPZC1SQtu6XFnxWkSq6gULQ6HyIo34duuU9+Pergfi CYFA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=AmCrbzcb; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id e67si1450413ywd.371.2018.01.05.19.23.22 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:23:22 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=AmCrbzcb; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44079 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXf4r-0006bq-QF for patch@linaro.org; Fri, 05 Jan 2018 22:23:21 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48476) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXewD-0007mP-41 for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:28 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXewA-0005is-Hv for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:25 -0500 Received: from mail-pg0-x243.google.com ([2607:f8b0:400e:c05::243]:42487) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXewA-0005hA-9S for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:22 -0500 Received: by mail-pg0-x243.google.com with SMTP id q67so2709781pga.9 for ; Fri, 05 Jan 2018 19:14:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=00hjtn7Jp4MNhaFa8SlxmgesWt1Jh7844uLgFJ4kNSk=; b=AmCrbzcbwHntvQJE6ThwC3vQ8ZWnwj6Ax5SVPsewY9BkXzgGkECAKE4LSHJ3PaLPjU 9WbZfTL/i+ThWY4pXbrBoVQUfcCpxD96UrbOkFWwwhIQtNj4LPReE5BYc7WXeCZ5reel cXAfbf93HsSJFwWDf3GI1uzWLCLK263o+Ur48= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=00hjtn7Jp4MNhaFa8SlxmgesWt1Jh7844uLgFJ4kNSk=; b=eooSLesoYLagu11jXT2ZvC/54kSVNXWCmceKsCSvZPDcAJFm+UTwR8rKcwjUQT08yw NII8+gnfNTixPMP05JJw86NUNZOzdzYX8lbXGpGPXNzhZYwgnKX0MTc2n/eADwuisBNd agZpa1Fem7MbKgt6dpNGCBSP3uhwDOKpAtrNOQQMhzLUOsgpHYilZmXS5p+b/cIf0ZYK UQFnc5Lltly/eMZvnn1LVLxpjkBJCaheFFXcPvX3SZ0DCh5SCLmKknVhYlH0Rymkr6Mv 809VpzkHtXxqjvyfyc1xSHEgE9dXFK1Pdv4inaB7itt+2iMMN12Z/2GkMWP2vwaYBuB4 Qe8Q== X-Gm-Message-State: AKGB3mKY1P+KJ1vqNI2EAi942tu8Cv0YzA1/pAlsYfO2qsewySIi4lNN e30Io2219b50khhN8nu4YpF1IetTCuQ= X-Received: by 10.99.148.26 with SMTP id m26mr4096040pge.157.1515208460954; Fri, 05 Jan 2018 19:14:20 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.14.19 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:14:20 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:43 -0800 Message-Id: <20180106031346.6650-21-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::243 Subject: [Qemu-devel] [PATCH v8 20/23] target/arm: Use vector infrastructure for aa64 multiplies X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- target/arm/translate-a64.c | 171 ++++++++++++++++++++++++++++++++++++--------- 1 file changed, 138 insertions(+), 33 deletions(-) -- 2.14.3 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 44e44cc9f2..48caba3d9f 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -9691,6 +9691,66 @@ static void disas_simd_3same_float(DisasContext *s, uint32_t insn) } } +static void gen_mla8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b) +{ + gen_helper_neon_mul_u8(a, a, b); + gen_helper_neon_add_u8(d, d, a); +} + +static void gen_mla16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b) +{ + gen_helper_neon_mul_u16(a, a, b); + gen_helper_neon_add_u16(d, d, a); +} + +static void gen_mla32_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b) +{ + tcg_gen_mul_i32(a, a, b); + tcg_gen_add_i32(d, d, a); +} + +static void gen_mla64_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + tcg_gen_mul_i64(a, a, b); + tcg_gen_add_i64(d, d, a); +} + +static void gen_mla_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_mul_vec(vece, a, a, b); + tcg_gen_add_vec(vece, d, d, a); +} + +static void gen_mls8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b) +{ + gen_helper_neon_mul_u8(a, a, b); + gen_helper_neon_sub_u8(d, d, a); +} + +static void gen_mls16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b) +{ + gen_helper_neon_mul_u16(a, a, b); + gen_helper_neon_sub_u16(d, d, a); +} + +static void gen_mls32_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b) +{ + tcg_gen_mul_i32(a, a, b); + tcg_gen_sub_i32(d, d, a); +} + +static void gen_mls64_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b) +{ + tcg_gen_mul_i64(a, a, b); + tcg_gen_sub_i64(d, d, a); +} + +static void gen_mls_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b) +{ + tcg_gen_mul_vec(vece, a, a, b); + tcg_gen_sub_vec(vece, d, d, a); +} + /* Integer op subgroup of C3.6.16. */ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) { @@ -9702,7 +9762,8 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) int rn = extract32(insn, 5, 5); int rd = extract32(insn, 0, 5); int pass; - GVecGen3Fn *gvec_op; + GVecGen3Fn *gvec_fn; + const GVecGen3 *gvec_op; TCGCond cond; switch (opcode) { @@ -9745,12 +9806,70 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) switch (opcode) { case 0x10: /* ADD, SUB */ - gvec_op = u ? tcg_gen_gvec_sub : tcg_gen_gvec_add; - gvec_op(size, vec_full_reg_offset(s, rd), + gvec_fn = u ? tcg_gen_gvec_sub : tcg_gen_gvec_add; + do_gvec: + gvec_fn(size, vec_full_reg_offset(s, rd), vec_full_reg_offset(s, rn), vec_full_reg_offset(s, rm), is_q ? 16 : 8, vec_full_reg_size(s)); return; + case 0x13: /* MUL, PMUL */ + if (!u) { /* MUL */ + gvec_fn = tcg_gen_gvec_mul; + goto do_gvec; + } + break; + case 0x12: /* MLA, MLS */ + { + static const GVecGen3 mla_op[4] = { + { .fni4 = gen_mla8_i32, + .fniv = gen_mla_vec, + .opc = INDEX_op_mul_vec, + .load_dest = true, + .vece = MO_8 }, + { .fni4 = gen_mla16_i32, + .fniv = gen_mla_vec, + .opc = INDEX_op_mul_vec, + .load_dest = true, + .vece = MO_16 }, + { .fni4 = gen_mla32_i32, + .fniv = gen_mla_vec, + .opc = INDEX_op_mul_vec, + .load_dest = true, + .vece = MO_32 }, + { .fni8 = gen_mla64_i64, + .fniv = gen_mla_vec, + .opc = INDEX_op_mul_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true, + .vece = MO_64 }, + }; + static const GVecGen3 mls_op[4] = { + { .fni4 = gen_mls8_i32, + .fniv = gen_mls_vec, + .opc = INDEX_op_mul_vec, + .load_dest = true, + .vece = MO_8 }, + { .fni4 = gen_mls16_i32, + .fniv = gen_mls_vec, + .opc = INDEX_op_mul_vec, + .load_dest = true, + .vece = MO_16 }, + { .fni4 = gen_mls32_i32, + .fniv = gen_mls_vec, + .opc = INDEX_op_mul_vec, + .load_dest = true, + .vece = MO_32 }, + { .fni8 = gen_mls64_i64, + .fniv = gen_mls_vec, + .opc = INDEX_op_mul_vec, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .load_dest = true, + .vece = MO_64 }, + }; + gvec_op = (u ? &mls_op[size] : &mla_op[size]); + } + goto do_gvec_op; case 0x11: if (u) { /* CMEQ */ cond = TCG_COND_EQ; @@ -9771,12 +9890,13 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) .prefer_i64 = TCG_TARGET_REG_BITS == 64, .vece = MO_64 }, }; - tcg_gen_gvec_3(vec_full_reg_offset(s, rd), - vec_full_reg_offset(s, rn), - vec_full_reg_offset(s, rm), - is_q ? 16 : 8, vec_full_reg_size(s), - &cmtst_op[size]); + gvec_op = &cmtst_op[size]; } + do_gvec_op: + tcg_gen_gvec_3(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s), gvec_op); return; case 0x06: /* CMGT, CMHI */ cond = u ? TCG_COND_GTU : TCG_COND_GT; @@ -9944,23 +10064,10 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) break; } case 0x13: /* MUL, PMUL */ - if (u) { - /* PMUL */ - assert(size == 0); - genfn = gen_helper_neon_mul_p8; - break; - } - /* fall through : MUL */ - case 0x12: /* MLA, MLS */ - { - static NeonGenTwoOpFn * const fns[3] = { - gen_helper_neon_mul_u8, - gen_helper_neon_mul_u16, - tcg_gen_mul_i32, - }; - genfn = fns[size]; + assert(u); /* PMUL */ + assert(size == 0); + genfn = gen_helper_neon_mul_p8; break; - } case 0x16: /* SQDMULH, SQRDMULH */ { static NeonGenTwoOpEnvFn * const fns[2][2] = { @@ -9981,18 +10088,16 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) genfn(tcg_res, tcg_op1, tcg_op2); } - if (opcode == 0xf || opcode == 0x12) { - /* SABA, UABA, MLA, MLS: accumulating ops */ - static NeonGenTwoOpFn * const fns[3][2] = { - { gen_helper_neon_add_u8, gen_helper_neon_sub_u8 }, - { gen_helper_neon_add_u16, gen_helper_neon_sub_u16 }, - { tcg_gen_add_i32, tcg_gen_sub_i32 }, + if (opcode == 0xf) { + /* SABA, UABA: accumulating ops */ + static NeonGenTwoOpFn * const fns[3] = { + gen_helper_neon_add_u8, + gen_helper_neon_add_u16, + tcg_gen_add_i32, }; - bool is_sub = (opcode == 0x12 && u); /* MLS */ - genfn = fns[size][is_sub]; read_vec_element_i32(s, tcg_op1, rd, pass, MO_32); - genfn(tcg_res, tcg_op1, tcg_res); + fns[size](tcg_res, tcg_op1, tcg_res); } write_vec_element_i32(s, tcg_res, rd, pass, MO_32); From patchwork Sat Jan 6 03:13:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123620 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp132614qgn; Fri, 5 Jan 2018 19:31:38 -0800 (PST) X-Google-Smtp-Source: ACJfBoszXaamXVHyhWnchvCuzAz19braQnlIHpqcqWBAoy1Y/a/ra7SiYySWaDY0ODyWgYsWipS+ X-Received: by 10.129.78.82 with SMTP id c79mr4756092ywb.509.1515209498318; Fri, 05 Jan 2018 19:31:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515209498; cv=none; d=google.com; s=arc-20160816; b=t6UT4brgc7fPbIcu0x9TbTMRe1miobWFRleSNudbf1wO6IkMFU/v26mK1H4lQ381Mj x/qtZz+BsFbZRLGLcw3TWkPXEVdkYxbKYNX3WkFBI6VTV13FaNi2iu1mCeiDU6yr+unu doJQLucSIFDC87DIZtRAqbTxoEHXIW2LKWbLMj7vCyxomPABBfdeX2rrjcH0wXvcLnT2 Qqb//OkOh9+Mprwc0f3NDYZuCYinLCasUU7NIlaSt4nF5yUtzf7tNzsy5c+qyNb0HTws AaHdZ/InOW4sAp8Fc+c/Fpn7wWc/aQ2hQY2JXitkT2cbamBbVrrvxYVnGHsuuqnpgQ8z lx0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=Mxo+uP67WSswZbYohOYvV1sBdXZaULpKkE4loDG62vs=; b=pvMqj8I+SKYjHHtdBqpCkOT/5lF5LaJQvNGlpby2aKCncbsJLcD/sh2YuJ03QXR4Bw Spqgw35d2KUSjYaYmKFCsOC7pa3xhEXw6cJwofZxdD7ttjyHXW8vMJD2+ve5R9urHfED Xyu8aU8NjjbSMEDpQmSouRGWNJCISg6TOtco5x4HJ3I9nRzDFiuh7YWprp3lN3m3rlCO mB41fa8eYouGfmZsw24Cxm+QZpdyqbXc54UEEp44Zraxq7gSgUiUc3Qb+Z6lWfYG4sok T+XGayUd1iSYk+9BhhfgglR6/aDvh6iDOBppGzY4+ehigwSq7uuuBlZjT1U6iCCuFtad VjpA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=EdEUfyeP; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id q65si1481412ywf.528.2018.01.05.19.31.38 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:31:38 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=EdEUfyeP; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44313 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXfCr-0005In-TL for patch@linaro.org; Fri, 05 Jan 2018 22:31:37 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48477) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXewD-0007mR-4k for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:28 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXewC-0005ml-5K for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:25 -0500 Received: from mail-pg0-x241.google.com ([2607:f8b0:400e:c05::241]:42486) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXewB-0005l6-Vh for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:24 -0500 Received: by mail-pg0-x241.google.com with SMTP id q67so2709803pga.9 for ; Fri, 05 Jan 2018 19:14:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=Mxo+uP67WSswZbYohOYvV1sBdXZaULpKkE4loDG62vs=; b=EdEUfyePa0L9ieIJSvNCusBtoOKI8KAIVb+GuTF6nTE+22+fkP0kXiocKJ6c5/CpzU cGCnNVmcKIk5gMKTMIk1FB68oksNbqzCGEHA8wPJJsKnyaC0B6gWkQKFDjTbaDNMGRqO dxsiS/+BnzzO9shhQthI0+COWk/BFgfIZRAFg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=Mxo+uP67WSswZbYohOYvV1sBdXZaULpKkE4loDG62vs=; b=peYpY/HCwtH9qS28Dd491k+FXI9s+ub/ButttO+X7/6+LsBoOGYLudHtTFD0GVTkYV 3dic6+Q0jZ5i4Dq5B3oQBZSkVM0NzEf6w5eCwjEm695cSTwFyz5JD1GNK1S/OmCnjGWM cOaEJa7CELSFE5NvehgmbqJvLWPMHaLLLToxp7RXUCBvGrFb+hZjbZ1DeK0sJ5LGonbN h+jWVS+8wODXsb3kAc27WK12u2qm6T2x5Ne5w1ZnO4ReZIi/30UHobjqqmT7de0oh2wt rIYrMric1/IZ2LVsBPxEYthfDZ8PQt34bP7rJIp0A+yEVWjUyqAXnqkmL2Wt0X9Vz6jR PZMg== X-Gm-Message-State: AKGB3mKgkJ6MSI2p/gTbQzSyfYtnxi8rbG77B1lOrmmKBdcP5VsGP23d 2lcT2KxojlzyWPY8XkgNMw6ZyYQRruQ= X-Received: by 10.99.99.131 with SMTP id x125mr3459910pgb.110.1515208462777; Fri, 05 Jan 2018 19:14:22 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.14.21 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:14:21 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:44 -0800 Message-Id: <20180106031346.6650-22-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::241 Subject: [Qemu-devel] [PATCH v8 21/23] target/arm: Use vector infrastructure for aa64 widening shifts X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- target/arm/translate-a64.c | 29 ++++++++++++----------------- 1 file changed, 12 insertions(+), 17 deletions(-) -- 2.14.3 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 48caba3d9f..4f15e58556 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -8705,12 +8705,7 @@ static void handle_vec_simd_wshli(DisasContext *s, bool is_q, bool is_u, int size = 32 - clz32(immh) - 1; int immhb = immh << 3 | immb; int shift = immhb - (8 << size); - int dsize = 64; - int esize = 8 << size; - int elements = dsize/esize; - TCGv_i64 tcg_rn = new_tmp_a64(s); - TCGv_i64 tcg_rd = new_tmp_a64(s); - int i; + GVecGen2Fn *gvec_fn; if (size >= 3) { unallocated_encoding(s); @@ -8721,18 +8716,18 @@ static void handle_vec_simd_wshli(DisasContext *s, bool is_q, bool is_u, return; } - /* For the LL variants the store is larger than the load, - * so if rd == rn we would overwrite parts of our input. - * So load everything right now and use shifts in the main loop. - */ - read_vec_element(s, tcg_rn, rn, is_q ? 1 : 0, MO_64); - - for (i = 0; i < elements; i++) { - tcg_gen_shri_i64(tcg_rd, tcg_rn, i * esize); - ext_and_shift_reg(tcg_rd, tcg_rd, size | (!is_u << 2), 0); - tcg_gen_shli_i64(tcg_rd, tcg_rd, shift); - write_vec_element(s, tcg_rd, rd, i, size + 1); + if (is_u) { + gvec_fn = is_q ? tcg_gen_gvec_extuh : tcg_gen_gvec_extul; + } else { + gvec_fn = is_q ? tcg_gen_gvec_extsh : tcg_gen_gvec_extsl; } + gvec_fn(size, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), 16, 16); + + /* Perform the shift in the wider format. */ + tcg_gen_gvec_shli(size + 1, vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rd), + 16, vec_full_reg_size(s), shift); } /* SHRN/RSHRN - Shift right with narrowing (and potential rounding) */ From patchwork Sat Jan 6 03:13:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123621 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp132857qgn; Fri, 5 Jan 2018 19:32:00 -0800 (PST) X-Google-Smtp-Source: ACJfBouolsLBVS6lF+U44RLoLw9ffxWoyiBSbQ+sVtFIpogOtsAFRJHEeX8WpsSXpIYX+T2nls0m X-Received: by 10.37.180.130 with SMTP id o2mr4630965ybj.17.1515209520907; Fri, 05 Jan 2018 19:32:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515209520; cv=none; d=google.com; s=arc-20160816; b=oGDDOjcU+uAMqKomOQTYaaI/0CohnDXNfta8WFOubClmXCnTdqYjvdQP3Hv8+t+kLa 9JxiMsoO8L/NAREUEU69D53O8H0km5gvK7HxaiYwpkbWAIkKRoLrGoNJ0+hjkR3P27Sf ZoWhk4OAgtbim8VFvE4kQ0FRi+6sSbK7ar5QrOrrZGRPjeDcpzmubQiJ1MoFathH3wzC grbt8u4evrGTI2gAnrB8DmPpyjt5YSeXmtVrVFFy8CXH1hFXfgAI2WhmcvCd9L2SMojA c7cf3bqBVrhFOQlNHFdWSxQIE6LG0U8jdTA/U8wPdVJjJASHxssgx9PGCuddw2EPGuT9 JZkQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=GQXDyjJBLQjGoyeu0DnHUN3p3vtzaCg7zuEmWuXqYoQ=; b=Jv8LIEad1JHX2CXj3lJs0cgYntpKP8r95l53UH8gkqCZcZJWbRMlP5/SFKMXemIcvD 67wXrOn1CBgw0RVb/+WA0VaKXIekoarsjpXKh7Kb5boArnvHFRpqDVQbYPWYLQJvahA8 tPhJZXZkMhHRWG8NWXhap+Rx4Qwiv9ReSjvY2+9pb93WmTeBR9AmScx+UdG49YjT8ofp 7MaBvKXSeMNU5/CuKhzA/Rfxy5y/eQWQ9J+CvoAz6Bxc2/uv6sWYgmpxSmcNX8zw8arU ZwX3eFAbi4oQ0LkSsdypSf3L/I0RHYKpJDTIChcPsjto9Bk8aPX4pUrGYI6n7R0kVOgi cB6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=f+hDWnOG; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id l5si1502855ywk.771.2018.01.05.19.32.00 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:32:00 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=f+hDWnOG; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44452 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXfDE-0006GG-AD for patch@linaro.org; Fri, 05 Jan 2018 22:32:00 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48517) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXewK-0007sJ-TG for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:39 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXewF-0005u4-20 for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:32 -0500 Received: from mail-pg0-x22e.google.com ([2607:f8b0:400e:c05::22e]:35108) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXewE-0005re-JC for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:26 -0500 Received: by mail-pg0-x22e.google.com with SMTP id d6so1651660pgv.2 for ; Fri, 05 Jan 2018 19:14:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=GQXDyjJBLQjGoyeu0DnHUN3p3vtzaCg7zuEmWuXqYoQ=; b=f+hDWnOGAHm+PNv3P4I1fD9Vq7OcsalX4oqOWesya4u0lDPTGjTrov4AK5P9w8rWbU kmjfcj2xK4v+gdk0rttO/EFonkuuKhBpM23bg3fpN6h4+x64O0sCBtzHjcUzPmniWIwP ++0wkWvmXXDRyJ2zBbqxuO+JfWdyIruQbBMu0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=GQXDyjJBLQjGoyeu0DnHUN3p3vtzaCg7zuEmWuXqYoQ=; b=KnQK8M2Ws1zjebR2zyahXlS0FIxj7Fb6PAjMofL6K8PUWfnCwyW6GDn1T5hfoqHJ0U 2Ru88PiEoZgShq8ePRdumWveGmmFPCYrHN2Twc78W6SswXwfzEp6HkofUKSm/oUM1ANA a5CZ9nIlitaSoni/LASREcf5STaB89JxlGrS0v/MNk+NYsKo/4ud5Q1a9ZTtV2ILjii8 BrUjTmPJ20pbX41ltfgPRxYzqrgvZFgIqemlw4bagHItsHluWMC/irrl3mc0fR/o+vkR qtxNFq2f2GxQUXyDkVdUKaXvM4YUAoOmPJU4fKMFGbOQiKg2ltnuSxubZgTjl6OTJYzb eiTA== X-Gm-Message-State: AKGB3mKBYCmZCGSpBF7HSkSa6ADKWExz1co01P3xxYopih6/Ik6CfGlq aLPyNDtvFbSuJHJIst+U4LlFQk9O+U8= X-Received: by 10.99.106.138 with SMTP id f132mr4156529pgc.115.1515208464264; Fri, 05 Jan 2018 19:14:24 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.14.22 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:14:23 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:45 -0800 Message-Id: <20180106031346.6650-23-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::22e Subject: [Qemu-devel] [PATCH v8 22/23] tcg/i386: Add vector operations X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" The x86 vector instruction set is extremely irregular. With newer editions, Intel has filled in some of the blanks. However, we don't get many 64-bit operations until SSE4.2, introduced in 2009. The subsequent edition was for AVX1, introduced in 2011, which added three-operand addressing, and adjusts how all instructions should be encoded. Given the relatively narrow 2 year window between possible to support and desirable to support, and to vastly simplify code maintainence, I am only planning to support AVX1 and later cpus. Signed-off-by: Richard Henderson --- tcg/i386/tcg-target.h | 46 +- tcg/i386/tcg-target.opc.h | 13 + tcg/i386/tcg-target.inc.c | 1331 +++++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 1336 insertions(+), 54 deletions(-) create mode 100644 tcg/i386/tcg-target.opc.h -- 2.14.3 diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h index b89dababf4..e77b95cc2c 100644 --- a/tcg/i386/tcg-target.h +++ b/tcg/i386/tcg-target.h @@ -30,10 +30,10 @@ #ifdef __x86_64__ # define TCG_TARGET_REG_BITS 64 -# define TCG_TARGET_NB_REGS 16 +# define TCG_TARGET_NB_REGS 32 #else # define TCG_TARGET_REG_BITS 32 -# define TCG_TARGET_NB_REGS 8 +# define TCG_TARGET_NB_REGS 24 #endif typedef enum { @@ -56,6 +56,26 @@ typedef enum { TCG_REG_R13, TCG_REG_R14, TCG_REG_R15, + + TCG_REG_XMM0, + TCG_REG_XMM1, + TCG_REG_XMM2, + TCG_REG_XMM3, + TCG_REG_XMM4, + TCG_REG_XMM5, + TCG_REG_XMM6, + TCG_REG_XMM7, + + /* 64-bit registers; likewise always define. */ + TCG_REG_XMM8, + TCG_REG_XMM9, + TCG_REG_XMM10, + TCG_REG_XMM11, + TCG_REG_XMM12, + TCG_REG_XMM13, + TCG_REG_XMM14, + TCG_REG_XMM15, + TCG_REG_RAX = TCG_REG_EAX, TCG_REG_RCX = TCG_REG_ECX, TCG_REG_RDX = TCG_REG_EDX, @@ -77,6 +97,8 @@ typedef enum { extern bool have_bmi1; extern bool have_popcnt; +extern bool have_avx1; +extern bool have_avx2; /* optional instructions */ #define TCG_TARGET_HAS_div2_i32 1 @@ -146,6 +168,26 @@ extern bool have_popcnt; #define TCG_TARGET_HAS_mulsh_i64 0 #endif +/* We do not support older SSE systems, only beginning with AVX1. */ +#define TCG_TARGET_HAS_v64 have_avx1 +#define TCG_TARGET_HAS_v128 have_avx1 +#define TCG_TARGET_HAS_v256 have_avx2 + +#define TCG_TARGET_HAS_andc_vec 1 +#define TCG_TARGET_HAS_orc_vec 0 +#define TCG_TARGET_HAS_not_vec 0 +#define TCG_TARGET_HAS_neg_vec 0 +#define TCG_TARGET_HAS_shi_vec 1 +#define TCG_TARGET_HAS_shs_vec 0 +#define TCG_TARGET_HAS_shv_vec 0 +#define TCG_TARGET_HAS_zip_vec 1 +#define TCG_TARGET_HAS_uzp_vec 0 +#define TCG_TARGET_HAS_trn_vec 0 +#define TCG_TARGET_HAS_cmp_vec 1 +#define TCG_TARGET_HAS_mul_vec 1 +#define TCG_TARGET_HAS_extl_vec 1 +#define TCG_TARGET_HAS_exth_vec 0 + #define TCG_TARGET_deposit_i32_valid(ofs, len) \ (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \ ((ofs) == 0 && (len) == 16)) diff --git a/tcg/i386/tcg-target.opc.h b/tcg/i386/tcg-target.opc.h new file mode 100644 index 0000000000..e5fa88ba25 --- /dev/null +++ b/tcg/i386/tcg-target.opc.h @@ -0,0 +1,13 @@ +/* Target-specific opcodes for host vector expansion. These will be + emitted by tcg_expand_vec_op. For those familiar with GCC internals, + consider these to be UNSPEC with names. */ + +DEF(x86_shufps_vec, 1, 2, 1, IMPLVEC) +DEF(x86_vpblendvb_vec, 1, 3, 0, IMPLVEC) +DEF(x86_blend_vec, 1, 2, 1, IMPLVEC) +DEF(x86_packss_vec, 1, 2, 0, IMPLVEC) +DEF(x86_packus_vec, 1, 2, 0, IMPLVEC) +DEF(x86_psrldq_vec, 1, 1, 1, IMPLVEC) +DEF(x86_vperm2i128_vec, 1, 2, 1, IMPLVEC) +DEF(x86_punpckl_vec, 1, 2, 0, IMPLVEC) +DEF(x86_punpckh_vec, 1, 2, 0, IMPLVEC) diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c index 63d27f10e7..4805da6130 100644 --- a/tcg/i386/tcg-target.inc.c +++ b/tcg/i386/tcg-target.inc.c @@ -28,9 +28,14 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = { #if TCG_TARGET_REG_BITS == 64 "%rax", "%rcx", "%rdx", "%rbx", "%rsp", "%rbp", "%rsi", "%rdi", - "%r8", "%r9", "%r10", "%r11", "%r12", "%r13", "%r14", "%r15", #else "%eax", "%ecx", "%edx", "%ebx", "%esp", "%ebp", "%esi", "%edi", +#endif + "%r8", "%r9", "%r10", "%r11", "%r12", "%r13", "%r14", "%r15", + "%xmm0", "%xmm1", "%xmm2", "%xmm3", "%xmm4", "%xmm5", "%xmm6", "%xmm7", +#if TCG_TARGET_REG_BITS == 64 + "%xmm8", "%xmm9", "%xmm10", "%xmm11", + "%xmm12", "%xmm13", "%xmm14", "%xmm15", #endif }; #endif @@ -60,6 +65,28 @@ static const int tcg_target_reg_alloc_order[] = { TCG_REG_ECX, TCG_REG_EDX, TCG_REG_EAX, +#endif + TCG_REG_XMM0, + TCG_REG_XMM1, + TCG_REG_XMM2, + TCG_REG_XMM3, + TCG_REG_XMM4, + TCG_REG_XMM5, +#ifndef _WIN64 + /* The Win64 ABI has xmm6-xmm15 as caller-saves, and we do not save + any of them. Therefore only allow xmm0-xmm5 to be allocated. */ + TCG_REG_XMM6, + TCG_REG_XMM7, +#if TCG_TARGET_REG_BITS == 64 + TCG_REG_XMM8, + TCG_REG_XMM9, + TCG_REG_XMM10, + TCG_REG_XMM11, + TCG_REG_XMM12, + TCG_REG_XMM13, + TCG_REG_XMM14, + TCG_REG_XMM15, +#endif #endif }; @@ -94,7 +121,7 @@ static const int tcg_target_call_oarg_regs[] = { #define TCG_CT_CONST_I32 0x400 #define TCG_CT_CONST_WSZ 0x800 -/* Registers used with L constraint, which are the first argument +/* Registers used with L constraint, which are the first argument registers on x86_64, and two random call clobbered registers on i386. */ #if TCG_TARGET_REG_BITS == 64 @@ -125,6 +152,8 @@ static bool have_cmov; it there. Therefore we always define the variable. */ bool have_bmi1; bool have_popcnt; +bool have_avx1; +bool have_avx2; #ifdef CONFIG_CPUID_H static bool have_movbe; @@ -148,6 +177,8 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type, if (value != (int32_t)value) { tcg_abort(); } + /* FALLTHRU */ + case R_386_32: tcg_patch32(code_ptr, value); break; case R_386_PC8: @@ -162,6 +193,14 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type, } } +#if TCG_TARGET_REG_BITS == 64 +#define ALL_GENERAL_REGS 0x0000ffffu +#define ALL_VECTOR_REGS 0xffff0000u +#else +#define ALL_GENERAL_REGS 0x000000ffu +#define ALL_VECTOR_REGS 0x00ff0000u +#endif + /* parse target specific constraints */ static const char *target_parse_constraint(TCGArgConstraint *ct, const char *ct_str, TCGType type) @@ -192,21 +231,29 @@ static const char *target_parse_constraint(TCGArgConstraint *ct, tcg_regset_set_reg(ct->u.regs, TCG_REG_EDI); break; case 'q': + /* A register that can be used as a byte operand. */ ct->ct |= TCG_CT_REG; ct->u.regs = TCG_TARGET_REG_BITS == 64 ? 0xffff : 0xf; break; case 'Q': + /* A register with an addressable second byte (e.g. %ah). */ ct->ct |= TCG_CT_REG; ct->u.regs = 0xf; break; case 'r': + /* A general register. */ ct->ct |= TCG_CT_REG; - ct->u.regs = TCG_TARGET_REG_BITS == 64 ? 0xffff : 0xff; + ct->u.regs |= ALL_GENERAL_REGS; break; case 'W': /* With TZCNT/LZCNT, we can have operand-size as an input. */ ct->ct |= TCG_CT_CONST_WSZ; break; + case 'x': + /* A vector register. */ + ct->ct |= TCG_CT_REG; + ct->u.regs |= ALL_VECTOR_REGS; + break; /* qemu_ld/st address constraint */ case 'L': @@ -277,14 +324,17 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type, # define P_REXB_RM 0 # define P_GS 0 #endif -#define P_SIMDF3 0x10000 /* 0xf3 opcode prefix */ -#define P_SIMDF2 0x20000 /* 0xf2 opcode prefix */ +#define P_EXT3A 0x10000 /* 0x0f 0x3a opcode prefix */ +#define P_SIMDF3 0x20000 /* 0xf3 opcode prefix */ +#define P_SIMDF2 0x40000 /* 0xf2 opcode prefix */ +#define P_VEXL 0x80000 /* Set VEX.L = 1 */ #define OPC_ARITH_EvIz (0x81) #define OPC_ARITH_EvIb (0x83) #define OPC_ARITH_GvEv (0x03) /* ... plus (ARITH_FOO << 3) */ #define OPC_ANDN (0xf2 | P_EXT38) #define OPC_ADD_GvEv (OPC_ARITH_GvEv | (ARITH_ADD << 3)) +#define OPC_BLENDPS (0x0c | P_EXT3A | P_DATA16) #define OPC_BSF (0xbc | P_EXT) #define OPC_BSR (0xbd | P_EXT) #define OPC_BSWAP (0xc8 | P_EXT) @@ -310,11 +360,68 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type, #define OPC_MOVL_Iv (0xb8) #define OPC_MOVBE_GyMy (0xf0 | P_EXT38) #define OPC_MOVBE_MyGy (0xf1 | P_EXT38) +#define OPC_MOVD_VyEy (0x6e | P_EXT | P_DATA16) +#define OPC_MOVD_EyVy (0x7e | P_EXT | P_DATA16) +#define OPC_MOVDDUP (0x12 | P_EXT | P_SIMDF2) +#define OPC_MOVDQA_VxWx (0x6f | P_EXT | P_DATA16) +#define OPC_MOVDQA_WxVx (0x7f | P_EXT | P_DATA16) +#define OPC_MOVDQU_VxWx (0x6f | P_EXT | P_SIMDF3) +#define OPC_MOVDQU_WxVx (0x7f | P_EXT | P_SIMDF3) +#define OPC_MOVQ_VqWq (0x7e | P_EXT | P_SIMDF3) +#define OPC_MOVQ_WqVq (0xd6 | P_EXT | P_DATA16) #define OPC_MOVSBL (0xbe | P_EXT) #define OPC_MOVSWL (0xbf | P_EXT) #define OPC_MOVSLQ (0x63 | P_REXW) #define OPC_MOVZBL (0xb6 | P_EXT) #define OPC_MOVZWL (0xb7 | P_EXT) +#define OPC_PACKSSDW (0x6b | P_EXT | P_DATA16) +#define OPC_PACKSSWB (0x63 | P_EXT | P_DATA16) +#define OPC_PACKUSDW (0x2b | P_EXT38 | P_DATA16) +#define OPC_PACKUSWB (0x67 | P_EXT | P_DATA16) +#define OPC_PADDB (0xfc | P_EXT | P_DATA16) +#define OPC_PADDW (0xfd | P_EXT | P_DATA16) +#define OPC_PADDD (0xfe | P_EXT | P_DATA16) +#define OPC_PADDQ (0xd4 | P_EXT | P_DATA16) +#define OPC_PAND (0xdb | P_EXT | P_DATA16) +#define OPC_PANDN (0xdf | P_EXT | P_DATA16) +#define OPC_PBLENDW (0x0e | P_EXT3A | P_DATA16) +#define OPC_PCMPEQB (0x74 | P_EXT | P_DATA16) +#define OPC_PCMPEQW (0x75 | P_EXT | P_DATA16) +#define OPC_PCMPEQD (0x76 | P_EXT | P_DATA16) +#define OPC_PCMPEQQ (0x29 | P_EXT38 | P_DATA16) +#define OPC_PCMPGTB (0x64 | P_EXT | P_DATA16) +#define OPC_PCMPGTW (0x65 | P_EXT | P_DATA16) +#define OPC_PCMPGTD (0x66 | P_EXT | P_DATA16) +#define OPC_PCMPGTQ (0x37 | P_EXT38 | P_DATA16) +#define OPC_PMOVSXBW (0x20 | P_EXT38 | P_DATA16) +#define OPC_PMOVSXWD (0x23 | P_EXT38 | P_DATA16) +#define OPC_PMOVSXDQ (0x25 | P_EXT38 | P_DATA16) +#define OPC_PMOVZXBW (0x30 | P_EXT38 | P_DATA16) +#define OPC_PMOVZXWD (0x33 | P_EXT38 | P_DATA16) +#define OPC_PMOVZXDQ (0x35 | P_EXT38 | P_DATA16) +#define OPC_PMULLW (0xd5 | P_EXT | P_DATA16) +#define OPC_PMULLD (0x40 | P_EXT38 | P_DATA16) +#define OPC_POR (0xeb | P_EXT | P_DATA16) +#define OPC_PSHUFB (0x00 | P_EXT38 | P_DATA16) +#define OPC_PSHUFD (0x70 | P_EXT | P_DATA16) +#define OPC_PSHUFLW (0x70 | P_EXT | P_SIMDF2) +#define OPC_PSHUFHW (0x70 | P_EXT | P_SIMDF3) +#define OPC_PSHIFTW_Ib (0x71 | P_EXT | P_DATA16) /* /2 /6 /4 */ +#define OPC_PSHIFTD_Ib (0x72 | P_EXT | P_DATA16) /* /2 /6 /4 */ +#define OPC_PSHIFTQ_Ib (0x73 | P_EXT | P_DATA16) /* /2 /6 /4 */ +#define OPC_PSUBB (0xf8 | P_EXT | P_DATA16) +#define OPC_PSUBW (0xf9 | P_EXT | P_DATA16) +#define OPC_PSUBD (0xfa | P_EXT | P_DATA16) +#define OPC_PSUBQ (0xfb | P_EXT | P_DATA16) +#define OPC_PUNPCKLBW (0x60 | P_EXT | P_DATA16) +#define OPC_PUNPCKLWD (0x61 | P_EXT | P_DATA16) +#define OPC_PUNPCKLDQ (0x62 | P_EXT | P_DATA16) +#define OPC_PUNPCKLQDQ (0x6c | P_EXT | P_DATA16) +#define OPC_PUNPCKHBW (0x68 | P_EXT | P_DATA16) +#define OPC_PUNPCKHWD (0x69 | P_EXT | P_DATA16) +#define OPC_PUNPCKHDQ (0x6a | P_EXT | P_DATA16) +#define OPC_PUNPCKHQDQ (0x6d | P_EXT | P_DATA16) +#define OPC_PXOR (0xef | P_EXT | P_DATA16) #define OPC_POP_r32 (0x58) #define OPC_POPCNT (0xb8 | P_EXT | P_SIMDF3) #define OPC_PUSH_r32 (0x50) @@ -326,14 +433,26 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type, #define OPC_SHIFT_Ib (0xc1) #define OPC_SHIFT_cl (0xd3) #define OPC_SARX (0xf7 | P_EXT38 | P_SIMDF3) +#define OPC_SHUFPS (0xc6 | P_EXT) #define OPC_SHLX (0xf7 | P_EXT38 | P_DATA16) #define OPC_SHRX (0xf7 | P_EXT38 | P_SIMDF2) #define OPC_TESTL (0x85) #define OPC_TZCNT (0xbc | P_EXT | P_SIMDF3) +#define OPC_UD2 (0x0b | P_EXT) +#define OPC_VPBLENDD (0x02 | P_EXT3A | P_DATA16) +#define OPC_VPBLENDVB (0x4c | P_EXT3A | P_DATA16) +#define OPC_VPBROADCASTB (0x78 | P_EXT38 | P_DATA16) +#define OPC_VPBROADCASTW (0x79 | P_EXT38 | P_DATA16) +#define OPC_VPBROADCASTD (0x58 | P_EXT38 | P_DATA16) +#define OPC_VPBROADCASTQ (0x59 | P_EXT38 | P_DATA16) +#define OPC_VPERMQ (0x00 | P_EXT3A | P_DATA16 | P_REXW) +#define OPC_VPERM2I128 (0x46 | P_EXT3A | P_DATA16 | P_VEXL) +#define OPC_VZEROUPPER (0x77 | P_EXT) #define OPC_XCHG_ax_r32 (0x90) #define OPC_GRP3_Ev (0xf7) #define OPC_GRP5 (0xff) +#define OPC_GRP14 (0x73 | P_EXT | P_DATA16) /* Group 1 opcode extensions for 0x80-0x83. These are also used as modifiers for OPC_ARITH. */ @@ -439,10 +558,12 @@ static void tcg_out_opc(TCGContext *s, int opc, int r, int rm, int x) tcg_out8(s, (uint8_t)(rex | 0x40)); } - if (opc & (P_EXT | P_EXT38)) { + if (opc & (P_EXT | P_EXT38 | P_EXT3A)) { tcg_out8(s, 0x0f); if (opc & P_EXT38) { tcg_out8(s, 0x38); + } else if (opc & P_EXT3A) { + tcg_out8(s, 0x3a); } } @@ -459,10 +580,12 @@ static void tcg_out_opc(TCGContext *s, int opc) } else if (opc & P_SIMDF2) { tcg_out8(s, 0xf2); } - if (opc & (P_EXT | P_EXT38)) { + if (opc & (P_EXT | P_EXT38 | P_EXT3A)) { tcg_out8(s, 0x0f); if (opc & P_EXT38) { tcg_out8(s, 0x38); + } else if (opc & P_EXT3A) { + tcg_out8(s, 0x3a); } } tcg_out8(s, opc); @@ -479,34 +602,42 @@ static void tcg_out_modrm(TCGContext *s, int opc, int r, int rm) tcg_out8(s, 0xc0 | (LOWREGMASK(r) << 3) | LOWREGMASK(rm)); } -static void tcg_out_vex_modrm(TCGContext *s, int opc, int r, int v, int rm) +static void tcg_out_vex_opc(TCGContext *s, int opc, int r, int v, + int rm, int index) { int tmp; - if ((opc & (P_REXW | P_EXT | P_EXT38)) || (rm & 8)) { + /* Use the two byte form if possible, which cannot encode + VEX.W, VEX.B, VEX.X, or an m-mmmm field other than P_EXT. */ + if ((opc & (P_EXT | P_EXT38 | P_EXT3A | P_REXW)) == P_EXT + && ((rm | index) & 8) == 0) { + /* Two byte VEX prefix. */ + tcg_out8(s, 0xc5); + + tmp = (r & 8 ? 0 : 0x80); /* VEX.R */ + } else { /* Three byte VEX prefix. */ tcg_out8(s, 0xc4); /* VEX.m-mmmm */ - if (opc & P_EXT38) { + if (opc & P_EXT3A) { + tmp = 3; + } else if (opc & P_EXT38) { tmp = 2; } else if (opc & P_EXT) { tmp = 1; } else { - tcg_abort(); + g_assert_not_reached(); } - tmp |= 0x40; /* VEX.X */ - tmp |= (r & 8 ? 0 : 0x80); /* VEX.R */ - tmp |= (rm & 8 ? 0 : 0x20); /* VEX.B */ + tmp |= (r & 8 ? 0 : 0x80); /* VEX.R */ + tmp |= (index & 8 ? 0 : 0x40); /* VEX.X */ + tmp |= (rm & 8 ? 0 : 0x20); /* VEX.B */ tcg_out8(s, tmp); - tmp = (opc & P_REXW ? 0x80 : 0); /* VEX.W */ - } else { - /* Two byte VEX prefix. */ - tcg_out8(s, 0xc5); - - tmp = (r & 8 ? 0 : 0x80); /* VEX.R */ + tmp = (opc & P_REXW ? 0x80 : 0); /* VEX.W */ } + + tmp |= (opc & P_VEXL ? 0x04 : 0); /* VEX.L */ /* VEX.pp */ if (opc & P_DATA16) { tmp |= 1; /* 0x66 */ @@ -518,6 +649,11 @@ static void tcg_out_vex_modrm(TCGContext *s, int opc, int r, int v, int rm) tmp |= (~v & 15) << 3; /* VEX.vvvv */ tcg_out8(s, tmp); tcg_out8(s, opc); +} + +static void tcg_out_vex_modrm(TCGContext *s, int opc, int r, int v, int rm) +{ + tcg_out_vex_opc(s, opc, r, v, rm, 0); tcg_out8(s, 0xc0 | (LOWREGMASK(r) << 3) | LOWREGMASK(rm)); } @@ -526,8 +662,8 @@ static void tcg_out_vex_modrm(TCGContext *s, int opc, int r, int v, int rm) mode for absolute addresses, ~RM is the size of the immediate operand that will follow the instruction. */ -static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, - int index, int shift, intptr_t offset) +static void tcg_out_sib_offset(TCGContext *s, int r, int rm, int index, + int shift, intptr_t offset) { int mod, len; @@ -538,7 +674,6 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, intptr_t pc = (intptr_t)s->code_ptr + 5 + ~rm; intptr_t disp = offset - pc; if (disp == (int32_t)disp) { - tcg_out_opc(s, opc, r, 0, 0); tcg_out8(s, (LOWREGMASK(r) << 3) | 5); tcg_out32(s, disp); return; @@ -548,7 +683,6 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, use of the MODRM+SIB encoding and is therefore larger than rip-relative addressing. */ if (offset == (int32_t)offset) { - tcg_out_opc(s, opc, r, 0, 0); tcg_out8(s, (LOWREGMASK(r) << 3) | 4); tcg_out8(s, (4 << 3) | 5); tcg_out32(s, offset); @@ -556,10 +690,9 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, } /* ??? The memory isn't directly addressable. */ - tcg_abort(); + g_assert_not_reached(); } else { /* Absolute address. */ - tcg_out_opc(s, opc, r, 0, 0); tcg_out8(s, (r << 3) | 5); tcg_out32(s, offset); return; @@ -582,7 +715,6 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, that would be used for %esp is the escape to the two byte form. */ if (index < 0 && LOWREGMASK(rm) != TCG_REG_ESP) { /* Single byte MODRM format. */ - tcg_out_opc(s, opc, r, rm, 0); tcg_out8(s, mod | (LOWREGMASK(r) << 3) | LOWREGMASK(rm)); } else { /* Two byte MODRM+SIB format. */ @@ -596,7 +728,6 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, tcg_debug_assert(index != TCG_REG_ESP); } - tcg_out_opc(s, opc, r, rm, index); tcg_out8(s, mod | (LOWREGMASK(r) << 3) | 4); tcg_out8(s, (shift << 6) | (LOWREGMASK(index) << 3) | LOWREGMASK(rm)); } @@ -608,6 +739,21 @@ static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, } } +static void tcg_out_modrm_sib_offset(TCGContext *s, int opc, int r, int rm, + int index, int shift, intptr_t offset) +{ + tcg_out_opc(s, opc, r, rm < 0 ? 0 : rm, index < 0 ? 0 : index); + tcg_out_sib_offset(s, r, rm, index, shift, offset); +} + +static void tcg_out_vex_modrm_sib_offset(TCGContext *s, int opc, int r, int v, + int rm, int index, int shift, + intptr_t offset) +{ + tcg_out_vex_opc(s, opc, r, v, rm < 0 ? 0 : rm, index < 0 ? 0 : index); + tcg_out_sib_offset(s, r, rm, index, shift, offset); +} + /* A simplification of the above with no index or shift. */ static inline void tcg_out_modrm_offset(TCGContext *s, int opc, int r, int rm, intptr_t offset) @@ -615,6 +761,30 @@ static inline void tcg_out_modrm_offset(TCGContext *s, int opc, int r, tcg_out_modrm_sib_offset(s, opc, r, rm, -1, 0, offset); } +static inline void tcg_out_vex_modrm_offset(TCGContext *s, int opc, int r, + int v, int rm, intptr_t offset) +{ + tcg_out_vex_modrm_sib_offset(s, opc, r, v, rm, -1, 0, offset); +} + +/* Output an opcode with an expected reference to the constant pool. */ +static inline void tcg_out_modrm_pool(TCGContext *s, int opc, int r) +{ + tcg_out_opc(s, opc, r, 0, 0); + /* Absolute for 32-bit, pc-relative for 64-bit. */ + tcg_out8(s, LOWREGMASK(r) << 3 | 5); + tcg_out32(s, 0); +} + +/* Output an opcode with an expected reference to the constant pool. */ +static inline void tcg_out_vex_modrm_pool(TCGContext *s, int opc, int r) +{ + tcg_out_vex_opc(s, opc, r, 0, 0, 0); + /* Absolute for 32-bit, pc-relative for 64-bit. */ + tcg_out8(s, LOWREGMASK(r) << 3 | 5); + tcg_out32(s, 0); +} + /* Generate dest op= src. Uses the same ARITH_* codes as tgen_arithi. */ static inline void tgen_arithr(TCGContext *s, int subop, int dest, int src) { @@ -625,12 +795,116 @@ static inline void tgen_arithr(TCGContext *s, int subop, int dest, int src) tcg_out_modrm(s, OPC_ARITH_GvEv + (subop << 3) + ext, dest, src); } -static inline void tcg_out_mov(TCGContext *s, TCGType type, - TCGReg ret, TCGReg arg) +static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg) +{ + int rexw = 0; + + if (arg == ret) { + return; + } + switch (type) { + case TCG_TYPE_I64: + rexw = P_REXW; + /* fallthru */ + case TCG_TYPE_I32: + if (ret < 16) { + if (arg < 16) { + tcg_out_modrm(s, OPC_MOVL_GvEv + rexw, ret, arg); + } else { + tcg_out_vex_modrm(s, OPC_MOVD_EyVy + rexw, arg, 0, ret); + } + } else { + if (arg < 16) { + tcg_out_vex_modrm(s, OPC_MOVD_VyEy + rexw, ret, 0, arg); + } else { + tcg_out_vex_modrm(s, OPC_MOVQ_VqWq, ret, 0, arg); + } + } + break; + + case TCG_TYPE_V64: + tcg_debug_assert(ret >= 16 && arg >= 16); + tcg_out_vex_modrm(s, OPC_MOVQ_VqWq, ret, 0, arg); + break; + case TCG_TYPE_V128: + tcg_debug_assert(ret >= 16 && arg >= 16); + tcg_out_vex_modrm(s, OPC_MOVDQA_VxWx, ret, 0, arg); + break; + case TCG_TYPE_V256: + tcg_debug_assert(ret >= 16 && arg >= 16); + tcg_out_vex_modrm(s, OPC_MOVDQA_VxWx | P_VEXL, ret, 0, arg); + break; + + default: + g_assert_not_reached(); + } +} + +static void tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece, + TCGReg r, TCGReg a) +{ + if (have_avx2) { + static const int dup_insn[4] = { + OPC_VPBROADCASTB, OPC_VPBROADCASTW, + OPC_VPBROADCASTD, OPC_VPBROADCASTQ, + }; + int vex_l = (type == TCG_TYPE_V256 ? P_VEXL : 0); + tcg_out_vex_modrm(s, dup_insn[vece] + vex_l, r, 0, a); + } else { + switch (vece) { + case MO_8: + /* ??? With zero in a register, use PSHUFB. */ + tcg_out_vex_modrm(s, OPC_PUNPCKLBW, r, 0, a); + a = r; + /* FALLTHRU */ + case MO_16: + tcg_out_vex_modrm(s, OPC_PUNPCKLWD, r, 0, a); + a = r; + /* FALLTHRU */ + case MO_32: + tcg_out_vex_modrm(s, OPC_PSHUFD, r, 0, a); + /* imm8 operand: all output lanes selected from input lane 0. */ + tcg_out8(s, 0); + break; + case MO_64: + tcg_out_vex_modrm(s, OPC_PUNPCKLQDQ, r, 0, a); + break; + default: + g_assert_not_reached(); + } + } +} + +static void tcg_out_dupi_vec(TCGContext *s, TCGType type, + TCGReg ret, tcg_target_long arg) { - if (arg != ret) { - int opc = OPC_MOVL_GvEv + (type == TCG_TYPE_I64 ? P_REXW : 0); - tcg_out_modrm(s, opc, ret, arg); + int vex_l = (type == TCG_TYPE_V256 ? P_VEXL : 0); + + if (arg == 0) { + tcg_out_vex_modrm(s, OPC_PXOR, ret, ret, ret); + return; + } + if (arg == -1) { + tcg_out_vex_modrm(s, OPC_PCMPEQB + vex_l, ret, ret, ret); + return; + } + + if (TCG_TARGET_REG_BITS == 64) { + if (type == TCG_TYPE_V64) { + tcg_out_vex_modrm_pool(s, OPC_MOVQ_VqWq, ret); + } else if (have_avx2) { + tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTQ + vex_l, ret); + } else { + tcg_out_vex_modrm_pool(s, OPC_MOVDDUP, ret); + } + new_pool_label(s, arg, R_386_PC32, s->code_ptr - 4, -4); + } else if (have_avx2) { + tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTD + vex_l, ret); + new_pool_label(s, arg, R_386_32, s->code_ptr - 4, 0); + } else { + tcg_out_vex_modrm_pool(s, OPC_MOVD_VyEy, ret); + new_pool_label(s, arg, R_386_32, s->code_ptr - 4, 0); + tcg_out_dup_vec(s, type, MO_32, ret, ret); } } @@ -639,6 +913,25 @@ static void tcg_out_movi(TCGContext *s, TCGType type, { tcg_target_long diff; + switch (type) { + case TCG_TYPE_I32: +#if TCG_TARGET_REG_BITS == 64 + case TCG_TYPE_I64: +#endif + if (ret < 16) { + break; + } + /* fallthru */ + case TCG_TYPE_V64: + case TCG_TYPE_V128: + case TCG_TYPE_V256: + tcg_debug_assert(ret >= 16); + tcg_out_dupi_vec(s, type, ret, arg); + return; + default: + g_assert_not_reached(); + } + if (arg == 0) { tgen_arithr(s, ARITH_XOR, ret, ret); return; @@ -667,6 +960,59 @@ static void tcg_out_movi(TCGContext *s, TCGType type, tcg_out64(s, arg); } +static void tcg_out_movi_vec(TCGContext *s, TCGType type, + TCGReg ret, const TCGArg *a) +{ + int n = (64 / TCG_TARGET_REG_BITS) << (type - TCG_TYPE_V64); + int opc, ofs, rel; + + tcg_debug_assert(ret >= 16); + tcg_debug_assert(type >= TCG_TYPE_V64); + + /* We assume that INDEX_op_dupi could not be used and therefore + we must use a constant pool entry. */ + + switch (type) { + case TCG_TYPE_V64: + opc = OPC_MOVQ_VqWq; + break; + case TCG_TYPE_V128: + opc = OPC_MOVDQU_VxWx; + break; + case TCG_TYPE_V256: + opc = OPC_MOVDQU_VxWx | P_VEXL; + break; + default: + g_assert_not_reached(); + } + tcg_out_vex_modrm_pool(s, opc, ret); + + if (TCG_TARGET_REG_BITS == 64) { + rel = R_386_PC32, ofs = -4; + } else { + rel = R_386_32, ofs = 0; + } + switch (n) { + case 1: + new_pool_label(s, a[0], rel, s->code_ptr - 4, rel); + break; + case 2: + new_pool_l2(s, rel, s->code_ptr - 4, ofs, a[0], a[1]); + break; + case 4: + new_pool_l4(s, rel, s->code_ptr - 4, ofs, a[0], a[1], a[2], a[3]); + break; +#if TCG_TARGET_REG_BITS == 32 + case 8: + new_pool_l8(s, rel, s->code_ptr - 4, ofs, + a[0], a[1], a[2], a[3], a[4], a[5], a[6], a[7]); + break; +#endif + default: + g_assert_not_reached(); + } +} + static inline void tcg_out_pushi(TCGContext *s, tcg_target_long val) { if (val == (int8_t)val) { @@ -702,18 +1048,74 @@ static inline void tcg_out_pop(TCGContext *s, int reg) tcg_out_opc(s, OPC_POP_r32 + LOWREGMASK(reg), 0, reg, 0); } -static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, - TCGReg arg1, intptr_t arg2) +static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, + TCGReg arg1, intptr_t arg2) { - int opc = OPC_MOVL_GvEv + (type == TCG_TYPE_I64 ? P_REXW : 0); - tcg_out_modrm_offset(s, opc, ret, arg1, arg2); + switch (type) { + case TCG_TYPE_I32: + if (ret < 16) { + tcg_out_modrm_offset(s, OPC_MOVL_GvEv, ret, arg1, arg2); + } else { + tcg_out_vex_modrm_offset(s, OPC_MOVD_VyEy, ret, 0, arg1, arg2); + } + break; + case TCG_TYPE_I64: + if (ret < 16) { + tcg_out_modrm_offset(s, OPC_MOVL_GvEv | P_REXW, ret, arg1, arg2); + break; + } + /* FALLTHRU */ + case TCG_TYPE_V64: + tcg_debug_assert(ret >= 16); + tcg_out_vex_modrm_offset(s, OPC_MOVQ_VqWq, ret, 0, arg1, arg2); + break; + case TCG_TYPE_V128: + tcg_debug_assert(ret >= 16); + tcg_out_vex_modrm_offset(s, OPC_MOVDQU_VxWx, ret, 0, arg1, arg2); + break; + case TCG_TYPE_V256: + tcg_debug_assert(ret >= 16); + tcg_out_vex_modrm_offset(s, OPC_MOVDQU_VxWx | P_VEXL, + ret, 0, arg1, arg2); + break; + default: + g_assert_not_reached(); + } } -static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, - TCGReg arg1, intptr_t arg2) +static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, + TCGReg arg1, intptr_t arg2) { - int opc = OPC_MOVL_EvGv + (type == TCG_TYPE_I64 ? P_REXW : 0); - tcg_out_modrm_offset(s, opc, arg, arg1, arg2); + switch (type) { + case TCG_TYPE_I32: + if (arg < 16) { + tcg_out_modrm_offset(s, OPC_MOVL_EvGv, arg, arg1, arg2); + } else { + tcg_out_vex_modrm_offset(s, OPC_MOVD_EyVy, arg, 0, arg1, arg2); + } + break; + case TCG_TYPE_I64: + if (arg < 16) { + tcg_out_modrm_offset(s, OPC_MOVL_EvGv | P_REXW, arg, arg1, arg2); + break; + } + /* FALLTHRU */ + case TCG_TYPE_V64: + tcg_debug_assert(arg >= 16); + tcg_out_vex_modrm_offset(s, OPC_MOVQ_WqVq, arg, 0, arg1, arg2); + break; + case TCG_TYPE_V128: + tcg_debug_assert(arg >= 16); + tcg_out_vex_modrm_offset(s, OPC_MOVDQU_WxVx, arg, 0, arg1, arg2); + break; + case TCG_TYPE_V256: + tcg_debug_assert(arg >= 16); + tcg_out_vex_modrm_offset(s, OPC_MOVDQU_WxVx | P_VEXL, + arg, 0, arg1, arg2); + break; + default: + g_assert_not_reached(); + } } static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val, @@ -725,6 +1127,8 @@ static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val, return false; } rexw = P_REXW; + } else if (type != TCG_TYPE_I32) { + return false; } tcg_out_modrm_offset(s, OPC_MOVL_EvIz | rexw, 0, base, ofs); tcg_out32(s, val); @@ -2259,8 +2663,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, break; case INDEX_op_mov_i32: /* Always emitted via tcg_out_mov. */ case INDEX_op_mov_i64: + case INDEX_op_mov_vec: case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi. */ case INDEX_op_movi_i64: + case INDEX_op_dupi_vec: case INDEX_op_call: /* Always emitted via tcg_out_call. */ default: tcg_abort(); @@ -2269,6 +2675,206 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, #undef OP_32_64 } +static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, + unsigned vecl, unsigned vece, + const TCGArg *args, const int *const_args) +{ + static int const add_insn[4] = { + OPC_PADDB, OPC_PADDW, OPC_PADDD, OPC_PADDQ + }; + static int const sub_insn[4] = { + OPC_PSUBB, OPC_PSUBW, OPC_PSUBD, OPC_PSUBQ + }; + static int const mul_insn[4] = { + OPC_UD2, OPC_PMULLW, OPC_PMULLD, OPC_UD2 + }; + static int const shift_imm_insn[4] = { + OPC_UD2, OPC_PSHIFTW_Ib, OPC_PSHIFTD_Ib, OPC_PSHIFTQ_Ib + }; + static int const cmpeq_insn[4] = { + OPC_PCMPEQB, OPC_PCMPEQW, OPC_PCMPEQD, OPC_PCMPEQQ + }; + static int const cmpgt_insn[4] = { + OPC_PCMPGTB, OPC_PCMPGTW, OPC_PCMPGTD, OPC_PCMPGTQ + }; + static int const punpckl_insn[4] = { + OPC_PUNPCKLBW, OPC_PUNPCKLWD, OPC_PUNPCKLDQ, OPC_PUNPCKLQDQ + }; + static int const punpckh_insn[4] = { + OPC_PUNPCKHBW, OPC_PUNPCKHWD, OPC_PUNPCKHDQ, OPC_PUNPCKHQDQ + }; + static int const packss_insn[4] = { + OPC_PACKSSWB, OPC_PACKSSDW, OPC_UD2, OPC_UD2 + }; + static int const packus_insn[4] = { + OPC_PACKUSWB, OPC_PACKUSDW, OPC_UD2, OPC_UD2 + }; + static int const pmovsx_insn[3] = { + OPC_PMOVSXBW, OPC_PMOVSXWD, OPC_PMOVSXDQ + }; + static int const pmovzx_insn[3] = { + OPC_PMOVZXBW, OPC_PMOVZXWD, OPC_PMOVZXDQ + }; + + TCGType type = vecl + TCG_TYPE_V64; + int insn, sub; + TCGArg a0, a1, a2; + + a0 = args[0]; + a1 = args[1]; + a2 = args[2]; + + switch (opc) { + case INDEX_op_add_vec: + insn = add_insn[vece]; + goto gen_simd; + case INDEX_op_sub_vec: + insn = sub_insn[vece]; + goto gen_simd; + case INDEX_op_mul_vec: + insn = mul_insn[vece]; + goto gen_simd; + case INDEX_op_and_vec: + insn = OPC_PAND; + goto gen_simd; + case INDEX_op_or_vec: + insn = OPC_POR; + goto gen_simd; + case INDEX_op_xor_vec: + insn = OPC_PXOR; + goto gen_simd; + case INDEX_op_zipl_vec: + case INDEX_op_x86_punpckl_vec: + insn = punpckl_insn[vece]; + goto gen_simd; + case INDEX_op_ziph_vec: + case INDEX_op_x86_punpckh_vec: + insn = punpckh_insn[vece]; + goto gen_simd; + case INDEX_op_x86_packss_vec: + insn = packss_insn[vece]; + goto gen_simd; + case INDEX_op_x86_packus_vec: + insn = packus_insn[vece]; + goto gen_simd; + gen_simd: + tcg_debug_assert(insn != OPC_UD2); + if (type == TCG_TYPE_V256) { + insn |= P_VEXL; + } + tcg_out_vex_modrm(s, insn, a0, a1, a2); + break; + + case INDEX_op_extsl_vec: + insn = pmovsx_insn[vece]; + goto gen_simd2; + case INDEX_op_extul_vec: + insn = pmovzx_insn[vece]; + goto gen_simd2; + gen_simd2: + tcg_debug_assert(vece < MO_64); + if (type == TCG_TYPE_V256) { + insn |= P_VEXL; + } + tcg_out_vex_modrm(s, insn, a0, 0, a1); + break; + + case INDEX_op_cmp_vec: + sub = args[3]; + if (sub == TCG_COND_EQ) { + insn = cmpeq_insn[vece]; + } else if (sub == TCG_COND_GT) { + insn = cmpgt_insn[vece]; + } else { + g_assert_not_reached(); + } + goto gen_simd; + + case INDEX_op_andc_vec: + insn = OPC_PANDN; + if (type == TCG_TYPE_V256) { + insn |= P_VEXL; + } + tcg_out_vex_modrm(s, insn, a0, a2, a1); + break; + + case INDEX_op_shli_vec: + sub = 6; + goto gen_shift; + case INDEX_op_shri_vec: + sub = 2; + goto gen_shift; + case INDEX_op_sari_vec: + tcg_debug_assert(vece != MO_64); + sub = 4; + gen_shift: + tcg_debug_assert(vece != MO_8); + insn = shift_imm_insn[vece]; + if (type == TCG_TYPE_V256) { + insn |= P_VEXL; + } + tcg_out_vex_modrm(s, insn, sub, a0, a1); + tcg_out8(s, a2); + break; + + case INDEX_op_ld_vec: + tcg_out_ld(s, type, a0, a1, a2); + break; + case INDEX_op_st_vec: + tcg_out_st(s, type, a0, a1, a2); + break; + case INDEX_op_movi_vec: + tcg_out_movi_vec(s, type, a0, args + 1); + break; + case INDEX_op_dup_vec: + tcg_out_dup_vec(s, type, vece, a0, a1); + break; + + case INDEX_op_x86_shufps_vec: + insn = OPC_SHUFPS; + sub = args[3]; + goto gen_simd_imm8; + case INDEX_op_x86_blend_vec: + if (vece == MO_16) { + insn = OPC_PBLENDW; + } else if (vece == MO_32) { + insn = (have_avx2 ? OPC_VPBLENDD : OPC_BLENDPS); + } else { + g_assert_not_reached(); + } + sub = args[3]; + goto gen_simd_imm8; + case INDEX_op_x86_vperm2i128_vec: + insn = OPC_VPERM2I128; + sub = args[3]; + goto gen_simd_imm8; + gen_simd_imm8: + if (type == TCG_TYPE_V256) { + insn |= P_VEXL; + } + tcg_out_vex_modrm(s, insn, a0, a1, a2); + tcg_out8(s, sub); + break; + + case INDEX_op_x86_vpblendvb_vec: + insn = OPC_VPBLENDVB; + if (type == TCG_TYPE_V256) { + insn |= P_VEXL; + } + tcg_out_vex_modrm(s, insn, a0, a1, a2); + tcg_out8(s, args[3] << 4); + break; + + case INDEX_op_x86_psrldq_vec: + tcg_out_vex_modrm(s, OPC_GRP14, 3, a0, a1); + tcg_out8(s, a2); + break; + + default: + g_assert_not_reached(); + } +} + static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) { static const TCGTargetOpDef r = { .args_ct_str = { "r" } }; @@ -2292,6 +2898,11 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) = { .args_ct_str = { "r", "r", "L", "L" } }; static const TCGTargetOpDef L_L_L_L = { .args_ct_str = { "L", "L", "L", "L" } }; + static const TCGTargetOpDef x_x = { .args_ct_str = { "x", "x" } }; + static const TCGTargetOpDef x_x_x = { .args_ct_str = { "x", "x", "x" } }; + static const TCGTargetOpDef x_x_x_x + = { .args_ct_str = { "x", "x", "x", "x" } }; + static const TCGTargetOpDef x_r = { .args_ct_str = { "x", "r" } }; switch (op) { case INDEX_op_goto_ptr: @@ -2493,12 +3104,608 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) return &s2; } + case INDEX_op_ld_vec: + case INDEX_op_st_vec: + return &x_r; + + case INDEX_op_add_vec: + case INDEX_op_sub_vec: + case INDEX_op_mul_vec: + case INDEX_op_and_vec: + case INDEX_op_or_vec: + case INDEX_op_xor_vec: + case INDEX_op_andc_vec: + case INDEX_op_cmp_vec: + case INDEX_op_zipl_vec: + case INDEX_op_ziph_vec: + case INDEX_op_x86_shufps_vec: + case INDEX_op_x86_blend_vec: + case INDEX_op_x86_packss_vec: + case INDEX_op_x86_packus_vec: + case INDEX_op_x86_vperm2i128_vec: + case INDEX_op_x86_punpckl_vec: + case INDEX_op_x86_punpckh_vec: + return &x_x_x; + case INDEX_op_dup_vec: + case INDEX_op_shli_vec: + case INDEX_op_shri_vec: + case INDEX_op_sari_vec: + case INDEX_op_extsl_vec: + case INDEX_op_extul_vec: + case INDEX_op_x86_psrldq_vec: + return &x_x; + case INDEX_op_x86_vpblendvb_vec: + return &x_x_x_x; + default: break; } return NULL; } +int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) +{ + switch (opc) { + case INDEX_op_add_vec: + case INDEX_op_sub_vec: + case INDEX_op_and_vec: + case INDEX_op_or_vec: + case INDEX_op_xor_vec: + case INDEX_op_andc_vec: + case INDEX_op_extsl_vec: + case INDEX_op_extul_vec: + return 1; + case INDEX_op_cmp_vec: + case INDEX_op_extsh_vec: + case INDEX_op_extuh_vec: + case INDEX_op_trne_vec: + case INDEX_op_trno_vec: + return -1; + + case INDEX_op_shli_vec: + case INDEX_op_shri_vec: + /* We must expand the operation for MO_8. */ + return vece == MO_8 ? -1 : 1; + + case INDEX_op_sari_vec: + /* We must expand the operation for MO_8. */ + if (vece == MO_8) { + return -1; + } + /* We can emulate this for MO_64, but it does not pay off + unless we're producing at least 4 values. */ + if (vece == MO_64) { + return type >= TCG_TYPE_V256 ? -1 : 0; + } + return 1; + + case INDEX_op_mul_vec: + if (vece == MO_8) { + /* We can expand the operation for MO_8. */ + return -1; + } + if (vece == MO_64) { + return 0; + } + return 1; + + case INDEX_op_zipl_vec: + /* We could support v256, but with 3 insns per opcode. + It is better to expand with v128 instead. */ + return type <= TCG_TYPE_V128; + case INDEX_op_ziph_vec: + if (type == TCG_TYPE_V64) { + return -1; + } + return type == TCG_TYPE_V128; + + case INDEX_op_uzpe_vec: + case INDEX_op_uzpo_vec: + /* ??? Not implemented for V256. */ + return -(type <= TCG_TYPE_V128); + + default: + return 0; + } +} + +void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece, + TCGArg a0, ...) +{ + va_list va; + TCGArg a1, a2; + TCGv_vec v0, v1, v2, t1, t2, t3, t4; + + va_start(va, a0); + v0 = temp_tcgv_vec(arg_temp(a0)); + + switch (opc) { + case INDEX_op_shli_vec: + case INDEX_op_shri_vec: + tcg_debug_assert(vece == MO_8); + a1 = va_arg(va, TCGArg); + a2 = va_arg(va, TCGArg); + /* Unpack to W, shift, and repack. Tricky bits: + (1) Use punpck*bw x,x to produce DDCCBBAA, + i.e. duplicate in other half of the 16-bit lane. + (2) For right-shift, add 8 so that the high half of + the lane becomes zero. For left-shift, we must + shift up and down again. + (3) Step 2 leaves high half zero such that PACKUSWB + (pack with unsigned saturation) does not modify + the quantity. */ + t1 = tcg_temp_new_vec(type); + t2 = tcg_temp_new_vec(type); + vec_gen_3(INDEX_op_zipl_vec, type, MO_8, tcgv_vec_arg(t1), a1, a1); + vec_gen_3(INDEX_op_ziph_vec, type, MO_8, tcgv_vec_arg(t2), a1, a1); + if (opc == INDEX_op_shri_vec) { + vec_gen_3(INDEX_op_shri_vec, type, MO_16, + tcgv_vec_arg(t1), tcgv_vec_arg(t1), a2 + 8); + vec_gen_3(INDEX_op_shri_vec, type, MO_16, + tcgv_vec_arg(t2), tcgv_vec_arg(t2), a2 + 8); + } else { + vec_gen_3(INDEX_op_shli_vec, type, MO_16, + tcgv_vec_arg(t1), tcgv_vec_arg(t1), a2 + 8); + vec_gen_3(INDEX_op_shli_vec, type, MO_16, + tcgv_vec_arg(t2), tcgv_vec_arg(t2), a2 + 8); + vec_gen_3(INDEX_op_shri_vec, type, MO_16, + tcgv_vec_arg(t1), tcgv_vec_arg(t1), 8); + vec_gen_3(INDEX_op_shri_vec, type, MO_16, + tcgv_vec_arg(t2), tcgv_vec_arg(t2), 8); + } + vec_gen_3(INDEX_op_x86_packus_vec, type, MO_8, + a0, tcgv_vec_arg(t1), tcgv_vec_arg(t2)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + break; + + case INDEX_op_sari_vec: + a1 = va_arg(va, TCGArg); + a2 = va_arg(va, TCGArg); + if (vece == MO_8) { + /* Unpack to W, shift, and repack, as above. */ + t1 = tcg_temp_new_vec(type); + t2 = tcg_temp_new_vec(type); + vec_gen_3(INDEX_op_zipl_vec, type, MO_8, tcgv_vec_arg(t1), a1, a1); + vec_gen_3(INDEX_op_ziph_vec, type, MO_8, tcgv_vec_arg(t2), a1, a1); + vec_gen_3(INDEX_op_sari_vec, type, MO_16, + tcgv_vec_arg(t1), tcgv_vec_arg(t1), a2 + 8); + vec_gen_3(INDEX_op_sari_vec, type, MO_16, + tcgv_vec_arg(t2), tcgv_vec_arg(t2), a2 + 8); + vec_gen_3(INDEX_op_x86_packss_vec, type, MO_8, + a0, tcgv_vec_arg(t1), tcgv_vec_arg(t2)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + break; + } + tcg_debug_assert(vece == MO_64); + /* MO_64: If the shift is <= 32, we can emulate the sign extend by + performing an arithmetic 32-bit shift and overwriting the high + half of the result (note that the ISA says shift of 32 is valid). */ + if (a2 <= 32) { + t1 = tcg_temp_new_vec(type); + vec_gen_3(INDEX_op_sari_vec, type, MO_32, tcgv_vec_arg(t1), a1, a2); + vec_gen_3(INDEX_op_shri_vec, type, MO_64, a0, a1, a2); + vec_gen_4(INDEX_op_x86_blend_vec, type, MO_32, + a0, a0, tcgv_vec_arg(t1), 0xaa); + tcg_temp_free_vec(t1); + break; + } + /* Otherwise we will need to use a compare vs 0 to produce the + sign-extend, shift and merge. */ + t1 = tcg_temp_new_vec(type); + t2 = tcg_const_zeros_vec(type); + vec_gen_4(INDEX_op_cmp_vec, type, MO_64, + tcgv_vec_arg(t1), tcgv_vec_arg(t2), a1, TCG_COND_GT); + tcg_temp_free_vec(t2); + vec_gen_3(INDEX_op_shri_vec, type, MO_64, a0, a1, a2); + vec_gen_3(INDEX_op_shli_vec, type, MO_64, + tcgv_vec_arg(t1), tcgv_vec_arg(t1), 64 - a2); + vec_gen_3(INDEX_op_or_vec, type, MO_64, a0, a0, tcgv_vec_arg(t1)); + tcg_temp_free_vec(t1); + break; + + case INDEX_op_mul_vec: + tcg_debug_assert(vece == MO_8); + a1 = va_arg(va, TCGArg); + a2 = va_arg(va, TCGArg); + switch (type) { + case TCG_TYPE_V64: + t1 = tcg_temp_new_vec(TCG_TYPE_V128); + t2 = tcg_temp_new_vec(TCG_TYPE_V128); + tcg_gen_dup16i_vec(t2, 0); + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V128, MO_8, + tcgv_vec_arg(t1), a1, tcgv_vec_arg(t2)); + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V128, MO_8, + tcgv_vec_arg(t2), tcgv_vec_arg(t2), a2); + tcg_gen_mul_vec(MO_16, t1, t1, t2); + tcg_gen_shri_vec(MO_16, t1, t1, 8); + vec_gen_3(INDEX_op_x86_packus_vec, TCG_TYPE_V128, MO_8, + a0, tcgv_vec_arg(t1), tcgv_vec_arg(t1)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + break; + + case TCG_TYPE_V128: + t1 = tcg_temp_new_vec(TCG_TYPE_V128); + t2 = tcg_temp_new_vec(TCG_TYPE_V128); + t3 = tcg_temp_new_vec(TCG_TYPE_V128); + t4 = tcg_temp_new_vec(TCG_TYPE_V128); + tcg_gen_dup16i_vec(t4, 0); + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V128, MO_8, + tcgv_vec_arg(t1), a1, tcgv_vec_arg(t4)); + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V128, MO_8, + tcgv_vec_arg(t2), tcgv_vec_arg(t4), a2); + vec_gen_3(INDEX_op_ziph_vec, TCG_TYPE_V128, MO_8, + tcgv_vec_arg(t3), a1, tcgv_vec_arg(t4)); + vec_gen_3(INDEX_op_ziph_vec, TCG_TYPE_V128, MO_8, + tcgv_vec_arg(t4), tcgv_vec_arg(t4), a2); + tcg_gen_mul_vec(MO_16, t1, t1, t2); + tcg_gen_mul_vec(MO_16, t3, t3, t4); + tcg_gen_shri_vec(MO_16, t1, t1, 8); + tcg_gen_shri_vec(MO_16, t3, t3, 8); + vec_gen_3(INDEX_op_x86_packus_vec, TCG_TYPE_V128, MO_8, + a0, tcgv_vec_arg(t1), tcgv_vec_arg(t3)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + tcg_temp_free_vec(t3); + tcg_temp_free_vec(t4); + break; + + case TCG_TYPE_V256: + t1 = tcg_temp_new_vec(TCG_TYPE_V256); + t2 = tcg_temp_new_vec(TCG_TYPE_V256); + t3 = tcg_temp_new_vec(TCG_TYPE_V256); + t4 = tcg_temp_new_vec(TCG_TYPE_V256); + tcg_gen_dup16i_vec(t4, 0); + /* a1: A[0-7] ... D[0-7]; a2: W[0-7] ... Z[0-7] + t1: extends of B[0-7], D[0-7] + t2: extends of X[0-7], Z[0-7] + t3: extends of A[0-7], C[0-7] + t4: extends of W[0-7], Y[0-7]. */ + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V256, MO_8, + tcgv_vec_arg(t1), a1, tcgv_vec_arg(t4)); + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V256, MO_8, + tcgv_vec_arg(t2), tcgv_vec_arg(t4), a2); + vec_gen_3(INDEX_op_ziph_vec, TCG_TYPE_V256, MO_8, + tcgv_vec_arg(t3), a1, tcgv_vec_arg(t4)); + vec_gen_3(INDEX_op_ziph_vec, TCG_TYPE_V256, MO_8, + tcgv_vec_arg(t4), tcgv_vec_arg(t4), a2); + /* t1: BX DZ; t2: AW CY. */ + tcg_gen_mul_vec(MO_16, t1, t1, t2); + tcg_gen_mul_vec(MO_16, t3, t3, t4); + tcg_gen_shri_vec(MO_16, t1, t1, 8); + tcg_gen_shri_vec(MO_16, t3, t3, 8); + /* a0: AW BX CY DZ. */ + vec_gen_3(INDEX_op_x86_packus_vec, TCG_TYPE_V256, MO_8, + a0, tcgv_vec_arg(t1), tcgv_vec_arg(t3)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + tcg_temp_free_vec(t3); + tcg_temp_free_vec(t4); + break; + + default: + g_assert_not_reached(); + } + break; + + case INDEX_op_ziph_vec: + tcg_debug_assert(type == TCG_TYPE_V64); + a1 = va_arg(va, TCGArg); + a2 = va_arg(va, TCGArg); + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V128, vece, a0, a1, a2); + vec_gen_3(INDEX_op_x86_psrldq_vec, TCG_TYPE_V128, MO_64, a0, a0, 8); + break; + + case INDEX_op_extsh_vec: + case INDEX_op_extuh_vec: + a1 = va_arg(va, TCGArg); + switch (type) { + case TCG_TYPE_V64: + vec_gen_3(INDEX_op_x86_psrldq_vec, type, MO_64, a0, a1, 4); + break; + case TCG_TYPE_V128: + vec_gen_3(INDEX_op_x86_psrldq_vec, type, MO_64, a0, a1, 8); + break; + case TCG_TYPE_V256: + vec_gen_4(INDEX_op_x86_vperm2i128_vec, type, 4, a0, a1, a1, 0x81); + break; + default: + g_assert_not_reached(); + } + vec_gen_2(opc == INDEX_op_extsh_vec ? INDEX_op_extsl_vec + : INDEX_op_extul_vec, type, vece, a0, a0); + break; + + case INDEX_op_uzpe_vec: + a1 = va_arg(va, TCGArg); + a2 = va_arg(va, TCGArg); + v1 = temp_tcgv_vec(arg_temp(a1)); + v2 = temp_tcgv_vec(arg_temp(a2)); + + if (type == TCG_TYPE_V128) { + switch (vece) { + case MO_8: + t1 = tcg_temp_new_vec(type); + t2 = tcg_temp_new_vec(type); + tcg_gen_dup16i_vec(t2, 0x00ff); + tcg_gen_and_vec(MO_16, t1, v2, t2); + tcg_gen_and_vec(MO_16, v0, v1, t2); + vec_gen_3(INDEX_op_x86_packus_vec, type, MO_8, + a0, a0, tcgv_vec_arg(t1)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + break; + case MO_16: + t1 = tcg_temp_new_vec(type); + t2 = tcg_temp_new_vec(type); + tcg_gen_dup32i_vec(t2, 0x0000ffff); + tcg_gen_and_vec(MO_32, t1, v2, t2); + tcg_gen_and_vec(MO_32, v0, v1, t2); + vec_gen_3(INDEX_op_x86_packus_vec, type, MO_16, + a0, a0, tcgv_vec_arg(t1)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + break; + case MO_32: + vec_gen_4(INDEX_op_x86_shufps_vec, type, MO_32, + a0, a1, a2, 0x88); + break; + case MO_64: + tcg_gen_zipl_vec(vece, v0, v1, v2); + break; + default: + g_assert_not_reached(); + } + } else { + tcg_debug_assert(type == TCG_TYPE_V64); + switch (vece) { + case MO_8: + t1 = tcg_temp_new_vec(TCG_TYPE_V128); + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V128, MO_64, + tcgv_vec_arg(t1), a1, a2); + t2 = tcg_temp_new_vec(TCG_TYPE_V128); + tcg_gen_dup16i_vec(t2, 0x00ff); + tcg_gen_and_vec(MO_16, t1, t1, t2); + vec_gen_3(INDEX_op_x86_packus_vec, TCG_TYPE_V128, MO_8, + a0, tcgv_vec_arg(t1), tcgv_vec_arg(t1)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + break; + case MO_16: + t1 = tcg_temp_new_vec(TCG_TYPE_V128); + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V128, MO_64, + tcgv_vec_arg(t1), a1, a2); + t2 = tcg_temp_new_vec(TCG_TYPE_V128); + tcg_gen_dup32i_vec(t2, 0x0000ffff); + tcg_gen_and_vec(MO_32, t1, t1, t2); + vec_gen_3(INDEX_op_x86_packus_vec, TCG_TYPE_V128, MO_16, + a0, tcgv_vec_arg(t1), tcgv_vec_arg(t1)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + break; + case MO_32: + tcg_gen_zipl_vec(vece, v0, v1, v2); + break; + default: + g_assert_not_reached(); + } + } + break; + + case INDEX_op_uzpo_vec: + a1 = va_arg(va, TCGArg); + a2 = va_arg(va, TCGArg); + v1 = temp_tcgv_vec(arg_temp(a1)); + v2 = temp_tcgv_vec(arg_temp(a2)); + + if (type == TCG_TYPE_V128) { + switch (vece) { + case MO_8: + t1 = tcg_temp_new_vec(type); + tcg_gen_shri_vec(MO_16, t1, v2, 8); + tcg_gen_shri_vec(MO_16, v0, v1, 8); + vec_gen_3(INDEX_op_x86_packus_vec, type, MO_8, + a0, a0, tcgv_vec_arg(t1)); + tcg_temp_free_vec(t1); + break; + case MO_16: + t1 = tcg_temp_new_vec(type); + tcg_gen_shri_vec(MO_32, t1, v2, 16); + tcg_gen_shri_vec(MO_32, v0, v1, 16); + vec_gen_3(INDEX_op_x86_packus_vec, type, MO_16, + a0, a0, tcgv_vec_arg(t1)); + tcg_temp_free_vec(t1); + break; + case MO_32: + vec_gen_4(INDEX_op_x86_shufps_vec, type, MO_32, + a0, a1, a2, 0xdd); + break; + case MO_64: + tcg_gen_ziph_vec(vece, v0, v1, v2); + break; + default: + g_assert_not_reached(); + } + } else { + switch (vece) { + case MO_8: + t1 = tcg_temp_new_vec(TCG_TYPE_V128); + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V128, MO_64, + tcgv_vec_arg(t1), a1, a2); + tcg_gen_shri_vec(MO_16, t1, t1, 8); + vec_gen_3(INDEX_op_x86_packus_vec, TCG_TYPE_V128, MO_8, + a0, tcgv_vec_arg(t1), tcgv_vec_arg(t1)); + tcg_temp_free_vec(t1); + break; + case MO_16: + t1 = tcg_temp_new_vec(TCG_TYPE_V128); + vec_gen_3(INDEX_op_zipl_vec, TCG_TYPE_V128, MO_64, + tcgv_vec_arg(t1), a1, a2); + tcg_gen_shri_vec(MO_32, t1, t1, 16); + vec_gen_3(INDEX_op_x86_packus_vec, TCG_TYPE_V128, MO_16, + a0, tcgv_vec_arg(t1), tcgv_vec_arg(t1)); + tcg_temp_free_vec(t1); + break; + case MO_32: + tcg_gen_ziph_vec(vece, v0, v1, v2); + break; + default: + g_assert_not_reached(); + } + } + break; + + case INDEX_op_trne_vec: + a1 = va_arg(va, TCGArg); + a2 = va_arg(va, TCGArg); + switch (vece) { + case MO_8: + t1 = tcg_temp_new_vec(type); + t2 = tcg_temp_new_vec(type); + vec_gen_3(INDEX_op_shli_vec, type, MO_16, + tcgv_vec_arg(t1), a2, 8); + tcg_gen_dup16i_vec(t2, 0xff00); + vec_gen_4(INDEX_op_x86_vpblendvb_vec, type, MO_8, + a0, a1, tcgv_vec_arg(t1), tcgv_vec_arg(t2)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + break; + case MO_16: + t1 = tcg_temp_new_vec(type); + vec_gen_3(INDEX_op_shli_vec, type, MO_32, + tcgv_vec_arg(t1), a2, 16); + vec_gen_4(INDEX_op_x86_blend_vec, type, MO_16, + a0, a1, tcgv_vec_arg(t1), 0xaa); + tcg_temp_free_vec(t1); + break; + case MO_32: + t1 = tcg_temp_new_vec(type); + vec_gen_3(INDEX_op_shli_vec, type, MO_64, + tcgv_vec_arg(t1), a2, 32); + vec_gen_4(INDEX_op_x86_blend_vec, type, MO_32, + a0, a1, tcgv_vec_arg(t1), 0xaa); + tcg_temp_free_vec(t1); + break; + case MO_64: + vec_gen_3(INDEX_op_x86_punpckl_vec, type, MO_64, a0, a1, a2); + break; + default: + g_assert_not_reached(); + } + break; + + case INDEX_op_trno_vec: + a1 = va_arg(va, TCGArg); + a2 = va_arg(va, TCGArg); + switch (vece) { + case MO_8: + t1 = tcg_temp_new_vec(type); + t2 = tcg_temp_new_vec(type); + vec_gen_3(INDEX_op_shri_vec, type, MO_16, + tcgv_vec_arg(t1), a1, 8); + tcg_gen_dup16i_vec(t2, 0xff00); + vec_gen_4(INDEX_op_x86_vpblendvb_vec, type, MO_8, + a0, tcgv_vec_arg(t1), a2, tcgv_vec_arg(t2)); + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + break; + case MO_16: + t1 = tcg_temp_new_vec(type); + vec_gen_3(INDEX_op_shri_vec, type, MO_32, + tcgv_vec_arg(t1), a1, 16); + vec_gen_4(INDEX_op_x86_blend_vec, type, MO_16, + a0, tcgv_vec_arg(t1), a2, 0xaa); + tcg_temp_free_vec(t1); + break; + case MO_32: + t1 = tcg_temp_new_vec(type); + vec_gen_3(INDEX_op_shri_vec, type, MO_64, + tcgv_vec_arg(t1), a1, 32); + vec_gen_4(INDEX_op_x86_blend_vec, type, MO_32, + a0, tcgv_vec_arg(t1), a2, 0xaa); + tcg_temp_free_vec(t1); + break; + case MO_64: + vec_gen_3(INDEX_op_x86_punpckh_vec, type, MO_64, a0, a1, a2); + break; + default: + g_assert_not_reached(); + } + break; + + case INDEX_op_cmp_vec: + { + enum { + NEED_SWAP = 1, + NEED_INV = 2, + NEED_BIAS = 4 + }; + static const uint8_t fixups[16] = { + [0 ... 15] = -1, + [TCG_COND_EQ] = 0, + [TCG_COND_NE] = NEED_INV, + [TCG_COND_GT] = 0, + [TCG_COND_LT] = NEED_SWAP, + [TCG_COND_LE] = NEED_INV, + [TCG_COND_GE] = NEED_SWAP | NEED_INV, + [TCG_COND_GTU] = NEED_BIAS, + [TCG_COND_LTU] = NEED_BIAS | NEED_SWAP, + [TCG_COND_LEU] = NEED_BIAS | NEED_INV, + [TCG_COND_GEU] = NEED_BIAS | NEED_SWAP | NEED_INV, + }; + + TCGCond cond; + uint8_t fixup; + + a1 = va_arg(va, TCGArg); + a2 = va_arg(va, TCGArg); + cond = va_arg(va, TCGArg); + fixup = fixups[cond & 15]; + tcg_debug_assert(fixup != 0xff); + + if (fixup & NEED_INV) { + cond = tcg_invert_cond(cond); + } + if (fixup & NEED_SWAP) { + TCGArg t; + t = a1, a1 = a2, a2 = t; + cond = tcg_swap_cond(cond); + } + + t1 = t2 = NULL; + if (fixup & NEED_BIAS) { + t1 = tcg_temp_new_vec(type); + t2 = tcg_temp_new_vec(type); + tcg_gen_dupi_vec(vece, t2, 1ull << ((8 << vece) - 1)); + tcg_gen_sub_vec(vece, t1, temp_tcgv_vec(arg_temp(a1)), t2); + tcg_gen_sub_vec(vece, t2, temp_tcgv_vec(arg_temp(a2)), t2); + a1 = tcgv_vec_arg(t1); + a2 = tcgv_vec_arg(t2); + cond = tcg_signed_cond(cond); + } + + tcg_debug_assert(cond == TCG_COND_EQ || cond == TCG_COND_GT); + vec_gen_4(INDEX_op_cmp_vec, type, vece, a0, a1, a2, cond); + + if (fixup & NEED_BIAS) { + tcg_temp_free_vec(t1); + tcg_temp_free_vec(t2); + } + if (fixup & NEED_INV) { + tcg_gen_not_vec(vece, v0, v0); + } + } + break; + + default: + break; + } + + va_end(va); +} + static const int tcg_target_callee_save_regs[] = { #if TCG_TARGET_REG_BITS == 64 TCG_REG_RBP, @@ -2577,6 +3784,9 @@ static void tcg_target_qemu_prologue(TCGContext *s) tcg_out_addi(s, TCG_REG_CALL_STACK, stack_addend); + if (have_avx2) { + tcg_out_vex_opc(s, OPC_VZEROUPPER, 0, 0, 0, 0); + } for (i = ARRAY_SIZE(tcg_target_callee_save_regs) - 1; i >= 0; i--) { tcg_out_pop(s, tcg_target_callee_save_regs[i]); } @@ -2598,9 +3808,16 @@ static void tcg_out_nop_fill(tcg_insn_unit *p, int count) static void tcg_target_init(TCGContext *s) { #ifdef CONFIG_CPUID_H - unsigned a, b, c, d; + unsigned a, b, c, d, b7 = 0; int max = __get_cpuid_max(0, 0); + if (max >= 7) { + /* BMI1 is available on AMD Piledriver and Intel Haswell CPUs. */ + __cpuid_count(7, 0, a, b7, c, d); + have_bmi1 = (b7 & bit_BMI) != 0; + have_bmi2 = (b7 & bit_BMI2) != 0; + } + if (max >= 1) { __cpuid(1, a, b, c, d); #ifndef have_cmov @@ -2609,17 +3826,22 @@ static void tcg_target_init(TCGContext *s) available, we'll use a small forward branch. */ have_cmov = (d & bit_CMOV) != 0; #endif + /* MOVBE is only available on Intel Atom and Haswell CPUs, so we need to probe for it. */ have_movbe = (c & bit_MOVBE) != 0; have_popcnt = (c & bit_POPCNT) != 0; - } - if (max >= 7) { - /* BMI1 is available on AMD Piledriver and Intel Haswell CPUs. */ - __cpuid_count(7, 0, a, b, c, d); - have_bmi1 = (b & bit_BMI) != 0; - have_bmi2 = (b & bit_BMI2) != 0; + /* There are a number of things we must check before we can be + sure of not hitting invalid opcode. */ + if (c & bit_OSXSAVE) { + unsigned xcrl, xcrh; + asm ("xgetbv" : "=a" (xcrl), "=d" (xcrh) : "c" (0)); + if ((xcrl & 6) == 6) { + have_avx1 = (c & bit_AVX) != 0; + have_avx2 = (b7 & bit_AVX2) != 0; + } + } } max = __get_cpuid_max(0x8000000, 0); @@ -2630,11 +3852,16 @@ static void tcg_target_init(TCGContext *s) } #endif /* CONFIG_CPUID_H */ + tcg_target_available_regs[TCG_TYPE_I32] = ALL_GENERAL_REGS; if (TCG_TARGET_REG_BITS == 64) { - tcg_target_available_regs[TCG_TYPE_I32] = 0xffff; - tcg_target_available_regs[TCG_TYPE_I64] = 0xffff; - } else { - tcg_target_available_regs[TCG_TYPE_I32] = 0xff; + tcg_target_available_regs[TCG_TYPE_I64] = ALL_GENERAL_REGS; + } + if (have_avx1) { + tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS; + tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS; + } + if (have_avx2) { + tcg_target_available_regs[TCG_TYPE_V256] = ALL_VECTOR_REGS; } tcg_target_call_clobber_regs = 0; From patchwork Sat Jan 6 03:13:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 123615 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp129011qgn; Fri, 5 Jan 2018 19:26:06 -0800 (PST) X-Google-Smtp-Source: ACJfBotxGgZPbr0w6cj832is4NRN4/WE349lgVkp2+ZdjG5thWkRf8ekcxBSQBJFQefJxLmFJRz4 X-Received: by 10.37.111.131 with SMTP id k125mr4730707ybc.274.1515209166513; Fri, 05 Jan 2018 19:26:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515209166; cv=none; d=google.com; s=arc-20160816; b=f6w29254D5+s7S+rVPGY4nfUrUP3ERPbUCgZSLpg2G2StDB1+TddD0WDY16tckk9ra rc+uspjK5h9L2OTQy5He/kwmaqXaatKwYJT4RmviKKl1XVrJv3EjQx0BJBmFURvAxZrD wDmC9VadAej8mzUCWKBVN0jnUCxQXgtO0Kv4Ko6r3NzjnP6PLwOVb195442ML3BXcIkc wTuOV8cHriaw780ssxC48HgAGoNZyQX9YCh1PWSYYRvGUgkeQ4Nv/9H2YC6C5XJw0J8C fQwpL64T8sHclNjcGKQWXN1yRkhhF3EUnjtqNJxeS46E6vzezjLMpmfyCQMN14f/p7gL 0Sog== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=52x2COf8UwUEyQcq35qjtAcZZnCn7a8/X+asC2/WP3U=; b=ecKBiXLRlY3vEExamv2rT/LvBDmv/yQ7dncEWU0fspKDE7bAXKft0cJ4lcyGCYCV4n yd3IABW/20GuVtx28RsvIADvJGBBJWWPO23tgcmQ1/3pvCkjBfeaS1RX0cCbs5Kt1q9A OGki2PWHZ6lx4bIDcYF8aNnlJqGarxrjLWKhEqRLsx4AAx6ziPsTr+D8E12nmQhtePrH muXK3sVq4o6D9WQUa2iBvXheoVBWIlXZSH/kMqZ4Bt96c1plSTsAkBnl2MBm9pdMo5Ng /vdcrSMYVKpt4XHtLYZPoORRkOI8E5OiYuGMFTw4mi+oF/J/Wdhk9MonjU2Zkxxsp4lH 484Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=YwbQ4ajf; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id n25si1503259ywh.499.2018.01.05.19.26.06 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Jan 2018 19:26:06 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=YwbQ4ajf; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:44100 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXf7V-0000bz-U4 for patch@linaro.org; Fri, 05 Jan 2018 22:26:05 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48509) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eXewJ-0007rI-NX for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:35 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eXewF-0005vx-Sj for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:31 -0500 Received: from mail-pg0-x244.google.com ([2607:f8b0:400e:c05::244]:46441) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eXewF-0005u8-HH for qemu-devel@nongnu.org; Fri, 05 Jan 2018 22:14:27 -0500 Received: by mail-pg0-x244.google.com with SMTP id r2so2708485pgq.13 for ; Fri, 05 Jan 2018 19:14:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=52x2COf8UwUEyQcq35qjtAcZZnCn7a8/X+asC2/WP3U=; b=YwbQ4ajfYkmn7IaOxCLFYhKq1wHATkhZ8txVnw8zzY9uO9MXMUJVYEJexdxPcWpn5z up+XgH3qyG/lDp7uW/xJtXf2ZVCRVWjSmwUY3PqFze2RSf4JQlc49ojALuNB/x1KMgqj Ag9xbbbCb9gQS6ND4ZORTPhL2GVN8nADQoA88= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=52x2COf8UwUEyQcq35qjtAcZZnCn7a8/X+asC2/WP3U=; b=bI+4VYs/GF65Qk0tAEqGQrcz6DOGoMzYv354tj+49V5KAQUtMDeBBjV/miV0r8K1zL AFmNIJS03vS06aRcZrZ+8LKe08BZC9k73j1V01AE3naGKkYcjO4Sbz+QuBZfpX2rhdd4 gaFVGYt7fsZrA+qmszaomPO9gNuKTptu5UFhPX/l+xTG5pAQc8GZ4E06lihdLyfS2APL epfBwOkJSaOq9xZLiqnKYiDxGUyvgdzyu3z6XZ3wTyKmXIVM0Sdmk3RTTXyFmScfmzOa EMAJUSm/ZZqhlyHR+TkbMMn79gm+kAQnJFzpxubdDTcjuUdA5k07uBQAPzw+dwjiDNcC C9dQ== X-Gm-Message-State: AKGB3mKVDHNdSDlOjC3MdstmKqBsJDSHqSliyNUuNGYmF8+hSyrkt//Q JU6zD9L4sPuSz4Ud/flHCapGFiaxYyg= X-Received: by 10.98.75.139 with SMTP id d11mr4798707pfj.11.1515208465745; Fri, 05 Jan 2018 19:14:25 -0800 (PST) Received: from cloudburst.twiddle.net (97-113-183-164.tukw.qwest.net. [97.113.183.164]) by smtp.gmail.com with ESMTPSA id g10sm17740595pfe.77.2018.01.05.19.14.24 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 19:14:24 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Jan 2018 19:13:46 -0800 Message-Id: <20180106031346.6650-24-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180106031346.6650-1-richard.henderson@linaro.org> References: <20180106031346.6650-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::244 Subject: [Qemu-devel] [PATCH v8 23/23] tcg/aarch64: Add vector operations X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- tcg/aarch64/tcg-target.h | 30 +- tcg/aarch64/tcg-target.opc.h | 3 + tcg/aarch64/tcg-target.inc.c | 674 ++++++++++++++++++++++++++++++++++++++++--- 3 files changed, 660 insertions(+), 47 deletions(-) create mode 100644 tcg/aarch64/tcg-target.opc.h -- 2.14.3 diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h index c2525066ab..46434ecca4 100644 --- a/tcg/aarch64/tcg-target.h +++ b/tcg/aarch64/tcg-target.h @@ -31,13 +31,22 @@ typedef enum { TCG_REG_SP = 31, TCG_REG_XZR = 31, + TCG_REG_V0 = 32, TCG_REG_V1, TCG_REG_V2, TCG_REG_V3, + TCG_REG_V4, TCG_REG_V5, TCG_REG_V6, TCG_REG_V7, + TCG_REG_V8, TCG_REG_V9, TCG_REG_V10, TCG_REG_V11, + TCG_REG_V12, TCG_REG_V13, TCG_REG_V14, TCG_REG_V15, + TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19, + TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23, + TCG_REG_V24, TCG_REG_V25, TCG_REG_V26, TCG_REG_V27, + TCG_REG_V28, TCG_REG_V29, TCG_REG_V30, TCG_REG_V31, + /* Aliases. */ TCG_REG_FP = TCG_REG_X29, TCG_REG_LR = TCG_REG_X30, TCG_AREG0 = TCG_REG_X19, } TCGReg; -#define TCG_TARGET_NB_REGS 32 +#define TCG_TARGET_NB_REGS 64 /* used for function call generation */ #define TCG_REG_CALL_STACK TCG_REG_SP @@ -113,6 +122,25 @@ typedef enum { #define TCG_TARGET_HAS_mulsh_i64 1 #define TCG_TARGET_HAS_direct_jump 1 +#define TCG_TARGET_HAS_v64 1 +#define TCG_TARGET_HAS_v128 1 +#define TCG_TARGET_HAS_v256 0 + +#define TCG_TARGET_HAS_andc_vec 1 +#define TCG_TARGET_HAS_orc_vec 1 +#define TCG_TARGET_HAS_not_vec 1 +#define TCG_TARGET_HAS_neg_vec 1 +#define TCG_TARGET_HAS_shi_vec 1 +#define TCG_TARGET_HAS_shs_vec 0 +#define TCG_TARGET_HAS_shv_vec 0 +#define TCG_TARGET_HAS_zip_vec 1 +#define TCG_TARGET_HAS_uzp_vec 1 +#define TCG_TARGET_HAS_trn_vec 1 +#define TCG_TARGET_HAS_cmp_vec 1 +#define TCG_TARGET_HAS_mul_vec 1 +#define TCG_TARGET_HAS_extl_vec 1 +#define TCG_TARGET_HAS_exth_vec 1 + #define TCG_TARGET_DEFAULT_MO (0) static inline void flush_icache_range(uintptr_t start, uintptr_t stop) diff --git a/tcg/aarch64/tcg-target.opc.h b/tcg/aarch64/tcg-target.opc.h new file mode 100644 index 0000000000..4816a6c3d4 --- /dev/null +++ b/tcg/aarch64/tcg-target.opc.h @@ -0,0 +1,3 @@ +/* Target-specific opcodes for host vector expansion. These will be + emitted by tcg_expand_vec_op. For those familiar with GCC internals, + consider these to be UNSPEC with names. */ diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c index 150530f30e..b2ce818d7c 100644 --- a/tcg/aarch64/tcg-target.inc.c +++ b/tcg/aarch64/tcg-target.inc.c @@ -20,10 +20,15 @@ QEMU_BUILD_BUG_ON(TCG_TYPE_I32 != 0 || TCG_TYPE_I64 != 1); #ifdef CONFIG_DEBUG_TCG static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = { - "%x0", "%x1", "%x2", "%x3", "%x4", "%x5", "%x6", "%x7", - "%x8", "%x9", "%x10", "%x11", "%x12", "%x13", "%x14", "%x15", - "%x16", "%x17", "%x18", "%x19", "%x20", "%x21", "%x22", "%x23", - "%x24", "%x25", "%x26", "%x27", "%x28", "%fp", "%x30", "%sp", + "x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7", + "x8", "x9", "x10", "x11", "x12", "x13", "x14", "x15", + "x16", "x17", "x18", "x19", "x20", "x21", "x22", "x23", + "x24", "x25", "x26", "x27", "x28", "fp", "x30", "sp", + + "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", + "v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15", + "v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23", + "v24", "v25", "v26", "v27", "v28", "fp", "v30", "v31", }; #endif /* CONFIG_DEBUG_TCG */ @@ -43,6 +48,14 @@ static const int tcg_target_reg_alloc_order[] = { /* X19 reserved for AREG0 */ /* X29 reserved as fp */ /* X30 reserved as temporary */ + + TCG_REG_V0, TCG_REG_V1, TCG_REG_V2, TCG_REG_V3, + TCG_REG_V4, TCG_REG_V5, TCG_REG_V6, TCG_REG_V7, + /* V8 - V15 are call-saved, and skipped. */ + TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19, + TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23, + TCG_REG_V24, TCG_REG_V25, TCG_REG_V26, TCG_REG_V27, + TCG_REG_V28, TCG_REG_V29, TCG_REG_V30, TCG_REG_V31, }; static const int tcg_target_call_iarg_regs[8] = { @@ -54,6 +67,7 @@ static const int tcg_target_call_oarg_regs[1] = { }; #define TCG_REG_TMP TCG_REG_X30 +#define TCG_VEC_TMP TCG_REG_V31 #ifndef CONFIG_SOFTMMU /* Note that XZR cannot be encoded in the address base register slot, @@ -119,9 +133,13 @@ static const char *target_parse_constraint(TCGArgConstraint *ct, const char *ct_str, TCGType type) { switch (*ct_str++) { - case 'r': + case 'r': /* general registers */ ct->ct |= TCG_CT_REG; - ct->u.regs = 0xffffffffu; + ct->u.regs |= 0xffffffffu; + break; + case 'w': /* advsimd registers */ + ct->ct |= TCG_CT_REG; + ct->u.regs |= 0xffffffff00000000ull; break; case 'l': /* qemu_ld / qemu_st address, data_reg */ ct->ct |= TCG_CT_REG; @@ -178,6 +196,98 @@ static inline bool is_limm(uint64_t val) return (val & (val - 1)) == 0; } +static bool is_fimm(uint64_t v64, int *op, int *cmode, int *imm8) +{ + int i; + + *op = 0; + if (v64 == (-1ull / 0xff) * (v64 & 0xff)) { + *cmode = 0xe; + *imm8 = v64 & 0xff; + return true; + } + if (v64 == (-1ull / 0xffff) * (v64 & 0xffff)) { + uint64_t v16 = v64 & 0xffff; + + if (v16 == (v64 & 0xff)) { + *cmode = 0x8; + *imm8 = v64 & 0xff; + return true; + } else if (v16 == (v64 & 0xff00)) { + *cmode = 0xa; + *imm8 = v16 >> 8; + return true; + } + } + if (v64 == deposit64(v64, 32, 32, v64)) { + uint64_t v32 = (uint32_t)v64; + + if (v32 == (v64 & 0xff)) { + *cmode = 0x0; + *imm8 = v64 & 0xff; + return true; + } else if (v32 == (v32 & 0xff00)) { + *cmode = 0x2; + *imm8 = (v64 >> 8) & 0xff; + return true; + } else if (v32 == (v32 & 0xff0000)) { + *cmode = 0x4; + *imm8 = (v64 >> 16) & 0xff; + return true; + } else if (v32 == (v32 & 0xff000000)) { + *cmode = 0x6; + *imm8 = v32 >> 24; + return true; + } else if ((v32 & 0xffff00ff) == 0xff) { + *cmode = 0xc; + *imm8 = (v64 >> 8) & 0xff; + return true; + } else if ((v32 & 0xff00ffff) == 0xffff) { + *cmode = 0xd; + *imm8 = (v64 >> 16) & 0xff; + return true; + } else if (extract32(v32, 0, 19) == 0 + && (extract32(v32, 25, 6) == 0x20 + || extract32(v32, 25, 6) == 0x1f)) { + *cmode = 0xf; + *imm8 = (extract32(v32, 31, 1) << 7) + | (extract32(v32, 25, 1) << 6) + | extract32(v32, 19, 6); + return true; + } + } + if (extract64(v64, 0, 48) == 0 + && (extract64(v64, 54, 9) == 0x100 + || extract64(v64, 54, 9) == 0x0ff)) { + *cmode = 0xf; + *op = 1; + *imm8 = (extract64(v64, 63, 1) << 7) + | (extract64(v64, 54, 1) << 6) + | extract64(v64, 48, 6); + return true; + } + for (i = 0; i < 64; i += 8) { + uint64_t byte = extract64(v64, i, 8); + if (byte != 0 && byte != 0xff) { + break; + } + } + if (i == 64) { + *cmode = 0xe; + *op = 1; + *imm8 = (extract64(v64, 0, 1) << 0) + | (extract64(v64, 8, 1) << 1) + | (extract64(v64, 16, 1) << 2) + | (extract64(v64, 24, 1) << 3) + | (extract64(v64, 32, 1) << 4) + | (extract64(v64, 40, 1) << 5) + | (extract64(v64, 48, 1) << 6) + | (extract64(v64, 56, 1) << 7); + return true; + } + return false; +} + static int tcg_target_const_match(tcg_target_long val, TCGType type, const TCGArgConstraint *arg_ct) { @@ -271,6 +381,9 @@ typedef enum { /* Load literal for loading the address at pc-relative offset */ I3305_LDR = 0x58000000, + I3305_LDR_v64 = 0x5c000000, + I3305_LDR_v128 = 0x9c000000, + /* Load/store register. Described here as 3.3.12, but the helper that emits them can transform to 3.3.10 or 3.3.13. */ I3312_STRB = 0x38000000 | LDST_ST << 22 | MO_8 << 30, @@ -290,6 +403,15 @@ typedef enum { I3312_LDRSHX = 0x38000000 | LDST_LD_S_X << 22 | MO_16 << 30, I3312_LDRSWX = 0x38000000 | LDST_LD_S_X << 22 | MO_32 << 30, + I3312_LDRVS = 0x3c000000 | LDST_LD << 22 | MO_32 << 30, + I3312_STRVS = 0x3c000000 | LDST_ST << 22 | MO_32 << 30, + + I3312_LDRVD = 0x3c000000 | LDST_LD << 22 | MO_64 << 30, + I3312_STRVD = 0x3c000000 | LDST_ST << 22 | MO_64 << 30, + + I3312_LDRVQ = 0x3c000000 | 3 << 22 | 0 << 30, + I3312_STRVQ = 0x3c000000 | 2 << 22 | 0 << 30, + I3312_TO_I3310 = 0x00200800, I3312_TO_I3313 = 0x01000000, @@ -374,8 +496,58 @@ typedef enum { I3510_EON = 0x4a200000, I3510_ANDS = 0x6a000000, - NOP = 0xd503201f, + /* AdvSIMD zip.uzp/trn */ + I3603_ZIP1 = 0x0e003800, + I3603_UZP1 = 0x0e001800, + I3603_TRN1 = 0x0e002800, + I3603_ZIP2 = 0x0e007800, + I3603_UZP2 = 0x0e005800, + I3603_TRN2 = 0x0e006800, + + /* AdvSIMD copy */ + I3605_DUP = 0x0e000400, + I3605_INS = 0x4e001c00, + I3605_UMOV = 0x0e003c00, + + /* AdvSIMD modified immediate */ + I3606_MOVI = 0x0f000400, + + /* AdvSIMD shift by immediate */ + I3614_SSHR = 0x0f000400, + I3614_SSRA = 0x0f001400, + I3614_SHL = 0x0f005400, + I3614_SSHLL = 0x0f00a400, + I3614_USHR = 0x2f000400, + I3614_USRA = 0x2f001400, + I3614_USHLL = 0x2f00a400, + + /* AdvSIMD three same. */ + I3616_ADD = 0x0e208400, + I3616_AND = 0x0e201c00, + I3616_BIC = 0x0e601c00, + I3616_EOR = 0x2e201c00, + I3616_MUL = 0x0e209c00, + I3616_ORR = 0x0ea01c00, + I3616_ORN = 0x0ee01c00, + I3616_SUB = 0x2e208400, + I3616_CMGT = 0x0e203400, + I3616_CMGE = 0x0e203c00, + I3616_CMTST = 0x0e208c00, + I3616_CMHI = 0x2e203400, + I3616_CMHS = 0x2e203c00, + I3616_CMEQ = 0x2e208c00, + + /* AdvSIMD two-reg misc. */ + I3617_CMGT0 = 0x0e208800, + I3617_CMEQ0 = 0x0e209800, + I3617_CMLT0 = 0x0e20a800, + I3617_CMGE0 = 0x2e208800, + I3617_CMLE0 = 0x2e20a800, + I3617_NOT = 0x2e205800, + I3617_NEG = 0x2e20b800, + /* System instructions. */ + NOP = 0xd503201f, DMB_ISH = 0xd50338bf, DMB_LD = 0x00000100, DMB_ST = 0x00000200, @@ -520,26 +692,71 @@ static void tcg_out_insn_3509(TCGContext *s, AArch64Insn insn, TCGType ext, tcg_out32(s, insn | ext << 31 | rm << 16 | ra << 10 | rn << 5 | rd); } +static void tcg_out_insn_3603(TCGContext *s, AArch64Insn insn, bool q, + unsigned size, TCGReg rd, TCGReg rn, TCGReg rm) +{ + tcg_out32(s, insn | q << 30 | (size << 22) | (rd & 0x1f) + | (rn & 0x1f) << 5 | (rm & 0x1f) << 16); +} + +static void tcg_out_insn_3605(TCGContext *s, AArch64Insn insn, bool q, + TCGReg rd, TCGReg rn, int dst_idx, int src_idx) +{ + /* Note that bit 11 set means general register input. Therefore + we can handle both register sets with one function. */ + tcg_out32(s, insn | q << 30 | (dst_idx << 16) | (src_idx << 11) + | (rd & 0x1f) | (~rn & 0x20) << 6 | (rn & 0x1f) << 5); +} + +static void tcg_out_insn_3606(TCGContext *s, AArch64Insn insn, bool q, + TCGReg rd, bool op, int cmode, uint8_t imm8) +{ + tcg_out32(s, insn | q << 30 | op << 29 | cmode << 12 | (rd & 0x1f) + | (imm8 & 0xe0) << (16 - 5) | (imm8 & 0x1f) << 5); +} + +static void tcg_out_insn_3614(TCGContext *s, AArch64Insn insn, bool q, + TCGReg rd, TCGReg rn, unsigned immhb) +{ + tcg_out32(s, insn | q << 30 | immhb << 16 + | (rn & 0x1f) << 5 | (rd & 0x1f)); +} + +static void tcg_out_insn_3616(TCGContext *s, AArch64Insn insn, bool q, + unsigned size, TCGReg rd, TCGReg rn, TCGReg rm) +{ + tcg_out32(s, insn | q << 30 | (size << 22) | (rm & 0x1f) << 16 + | (rn & 0x1f) << 5 | (rd & 0x1f)); +} + +static void tcg_out_insn_3617(TCGContext *s, AArch64Insn insn, bool q, + unsigned size, TCGReg rd, TCGReg rn) +{ + tcg_out32(s, insn | q << 30 | (size << 22) + | (rn & 0x1f) << 5 | (rd & 0x1f)); +} + static void tcg_out_insn_3310(TCGContext *s, AArch64Insn insn, TCGReg rd, TCGReg base, TCGType ext, TCGReg regoff) { /* Note the AArch64Insn constants above are for C3.3.12. Adjust. */ tcg_out32(s, insn | I3312_TO_I3310 | regoff << 16 | - 0x4000 | ext << 13 | base << 5 | rd); + 0x4000 | ext << 13 | base << 5 | (rd & 0x1f)); } static void tcg_out_insn_3312(TCGContext *s, AArch64Insn insn, TCGReg rd, TCGReg rn, intptr_t offset) { - tcg_out32(s, insn | (offset & 0x1ff) << 12 | rn << 5 | rd); + tcg_out32(s, insn | (offset & 0x1ff) << 12 | rn << 5 | (rd & 0x1f)); } static void tcg_out_insn_3313(TCGContext *s, AArch64Insn insn, TCGReg rd, TCGReg rn, uintptr_t scaled_uimm) { /* Note the AArch64Insn constants above are for C3.3.12. Adjust. */ - tcg_out32(s, insn | I3312_TO_I3313 | scaled_uimm << 10 | rn << 5 | rd); + tcg_out32(s, insn | I3312_TO_I3313 | scaled_uimm << 10 + | rn << 5 | (rd & 0x1f)); } /* Register to register move using ORR (shifted register with no shift). */ @@ -585,6 +802,35 @@ static void tcg_out_logicali(TCGContext *s, AArch64Insn insn, TCGType ext, tcg_out_insn_3404(s, insn, ext, rd, rn, ext, r, c); } +static void tcg_out_dupi_vec(TCGContext *s, TCGType type, + TCGReg rd, uint64_t v64) +{ + int op, cmode, imm8; + + if (is_fimm(v64, &op, &cmode, &imm8)) { + tcg_out_insn(s, 3606, MOVI, type == TCG_TYPE_V128, rd, op, cmode, imm8); + } else if (type == TCG_TYPE_V128) { + new_pool_l2(s, R_AARCH64_CONDBR19, s->code_ptr, 0, v64, v64); + tcg_out_insn(s, 3305, LDR_v128, 0, rd); + } else { + new_pool_label(s, v64, R_AARCH64_CONDBR19, s->code_ptr, 0); + tcg_out_insn(s, 3305, LDR_v64, 0, rd); + } +} + +static void tcg_out_movi_vec(TCGContext *s, TCGType type, + TCGReg ret, const TCGArg *a) +{ + if (type == TCG_TYPE_V128) { + /* We assume that INDEX_op_dupi could not be used and + therefore we must use a constant pool entry. */ + new_pool_l2(s, R_AARCH64_CONDBR19, s->code_ptr, 0, a[0], a[1]); + tcg_out_insn(s, 3305, LDR_v128, 0, ret); + } else { + tcg_out_dupi_vec(s, TCG_TYPE_V64, ret, a[0]); + } +} + static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd, tcg_target_long value) { @@ -594,6 +840,22 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd, int s0, s1; AArch64Insn opc; + switch (type) { + case TCG_TYPE_I32: + case TCG_TYPE_I64: + tcg_debug_assert(rd < 32); + break; + + case TCG_TYPE_V64: + case TCG_TYPE_V128: + tcg_debug_assert(rd >= 32); + tcg_out_dupi_vec(s, type, rd, value); + return; + + default: + g_assert_not_reached(); + } + /* For 32-bit values, discard potential garbage in value. For 64-bit values within [2**31, 2**32-1], we can create smaller sequences by interpreting this as a negative 32-bit number, while ensuring that @@ -669,15 +931,13 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd, /* Define something more legible for general use. */ #define tcg_out_ldst_r tcg_out_insn_3310 -static void tcg_out_ldst(TCGContext *s, AArch64Insn insn, - TCGReg rd, TCGReg rn, intptr_t offset) +static void tcg_out_ldst(TCGContext *s, AArch64Insn insn, TCGReg rd, + TCGReg rn, intptr_t offset, int lgsize) { - TCGMemOp size = (uint32_t)insn >> 30; - /* If the offset is naturally aligned and in range, then we can use the scaled uimm12 encoding */ - if (offset >= 0 && !(offset & ((1 << size) - 1))) { - uintptr_t scaled_uimm = offset >> size; + if (offset >= 0 && !(offset & ((1 << lgsize) - 1))) { + uintptr_t scaled_uimm = offset >> lgsize; if (scaled_uimm <= 0xfff) { tcg_out_insn_3313(s, insn, rd, rn, scaled_uimm); return; @@ -695,32 +955,102 @@ static void tcg_out_ldst(TCGContext *s, AArch64Insn insn, tcg_out_ldst_r(s, insn, rd, rn, TCG_TYPE_I64, TCG_REG_TMP); } -static inline void tcg_out_mov(TCGContext *s, - TCGType type, TCGReg ret, TCGReg arg) +static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg) { - if (ret != arg) { - tcg_out_movr(s, type, ret, arg); + if (ret == arg) { + return; + } + switch (type) { + case TCG_TYPE_I32: + case TCG_TYPE_I64: + if (ret < 32 && arg < 32) { + tcg_out_movr(s, type, ret, arg); + break; + } else if (ret < 32) { + tcg_out_insn(s, 3605, UMOV, type, ret, arg, 0, 0); + break; + } else if (arg < 32) { + tcg_out_insn(s, 3605, INS, 0, ret, arg, 4 << type, 0); + break; + } + /* FALLTHRU */ + + case TCG_TYPE_V64: + tcg_debug_assert(ret >= 32 && arg >= 32); + tcg_out_insn(s, 3616, ORR, 0, 0, ret, arg, arg); + break; + case TCG_TYPE_V128: + tcg_debug_assert(ret >= 32 && arg >= 32); + tcg_out_insn(s, 3616, ORR, 1, 0, ret, arg, arg); + break; + + default: + g_assert_not_reached(); } } -static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg, - TCGReg arg1, intptr_t arg2) +static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, + TCGReg base, intptr_t ofs) { - tcg_out_ldst(s, type == TCG_TYPE_I32 ? I3312_LDRW : I3312_LDRX, - arg, arg1, arg2); + AArch64Insn insn; + int lgsz; + + switch (type) { + case TCG_TYPE_I32: + insn = (ret < 32 ? I3312_LDRW : I3312_LDRVS); + lgsz = 2; + break; + case TCG_TYPE_I64: + insn = (ret < 32 ? I3312_LDRX : I3312_LDRVD); + lgsz = 3; + break; + case TCG_TYPE_V64: + insn = I3312_LDRVD; + lgsz = 3; + break; + case TCG_TYPE_V128: + insn = I3312_LDRVQ; + lgsz = 4; + break; + default: + g_assert_not_reached(); + } + tcg_out_ldst(s, insn, ret, base, ofs, lgsz); } -static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, - TCGReg arg1, intptr_t arg2) +static void tcg_out_st(TCGContext *s, TCGType type, TCGReg src, + TCGReg base, intptr_t ofs) { - tcg_out_ldst(s, type == TCG_TYPE_I32 ? I3312_STRW : I3312_STRX, - arg, arg1, arg2); + AArch64Insn insn; + int lgsz; + + switch (type) { + case TCG_TYPE_I32: + insn = (src < 32 ? I3312_STRW : I3312_STRVS); + lgsz = 2; + break; + case TCG_TYPE_I64: + insn = (src < 32 ? I3312_STRX : I3312_STRVD); + lgsz = 3; + break; + case TCG_TYPE_V64: + insn = I3312_STRVD; + lgsz = 3; + break; + case TCG_TYPE_V128: + insn = I3312_STRVQ; + lgsz = 4; + break; + default: + g_assert_not_reached(); + } + tcg_out_ldst(s, insn, src, base, ofs, lgsz); } static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val, TCGReg base, intptr_t ofs) { - if (val == 0) { + if (type <= TCG_TYPE_I64 && val == 0) { tcg_out_st(s, type, TCG_REG_XZR, base, ofs); return true; } @@ -1210,14 +1540,15 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp opc, /* Merge "low bits" from tlb offset, load the tlb comparator into X0. X0 = load [X2 + (tlb_offset & 0x000fff)] */ tcg_out_ldst(s, TARGET_LONG_BITS == 32 ? I3312_LDRW : I3312_LDRX, - TCG_REG_X0, TCG_REG_X2, tlb_offset & 0xfff); + TCG_REG_X0, TCG_REG_X2, tlb_offset & 0xfff, + TARGET_LONG_BITS == 32 ? 2 : 3); /* Load the tlb addend. Do that early to avoid stalling. X1 = load [X2 + (tlb_offset & 0xfff) + offsetof(addend)] */ tcg_out_ldst(s, I3312_LDRX, TCG_REG_X1, TCG_REG_X2, (tlb_offset & 0xfff) + (offsetof(CPUTLBEntry, addend)) - (is_read ? offsetof(CPUTLBEntry, addr_read) - : offsetof(CPUTLBEntry, addr_write))); + : offsetof(CPUTLBEntry, addr_write)), 3); /* Perform the address comparison. */ tcg_out_cmp(s, (TARGET_LONG_BITS == 64), TCG_REG_X0, TCG_REG_X3, 0); @@ -1435,49 +1766,49 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, case INDEX_op_ld8u_i32: case INDEX_op_ld8u_i64: - tcg_out_ldst(s, I3312_LDRB, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRB, a0, a1, a2, 0); break; case INDEX_op_ld8s_i32: - tcg_out_ldst(s, I3312_LDRSBW, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRSBW, a0, a1, a2, 0); break; case INDEX_op_ld8s_i64: - tcg_out_ldst(s, I3312_LDRSBX, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRSBX, a0, a1, a2, 0); break; case INDEX_op_ld16u_i32: case INDEX_op_ld16u_i64: - tcg_out_ldst(s, I3312_LDRH, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRH, a0, a1, a2, 1); break; case INDEX_op_ld16s_i32: - tcg_out_ldst(s, I3312_LDRSHW, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRSHW, a0, a1, a2, 1); break; case INDEX_op_ld16s_i64: - tcg_out_ldst(s, I3312_LDRSHX, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRSHX, a0, a1, a2, 1); break; case INDEX_op_ld_i32: case INDEX_op_ld32u_i64: - tcg_out_ldst(s, I3312_LDRW, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRW, a0, a1, a2, 2); break; case INDEX_op_ld32s_i64: - tcg_out_ldst(s, I3312_LDRSWX, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRSWX, a0, a1, a2, 2); break; case INDEX_op_ld_i64: - tcg_out_ldst(s, I3312_LDRX, a0, a1, a2); + tcg_out_ldst(s, I3312_LDRX, a0, a1, a2, 3); break; case INDEX_op_st8_i32: case INDEX_op_st8_i64: - tcg_out_ldst(s, I3312_STRB, REG0(0), a1, a2); + tcg_out_ldst(s, I3312_STRB, REG0(0), a1, a2, 0); break; case INDEX_op_st16_i32: case INDEX_op_st16_i64: - tcg_out_ldst(s, I3312_STRH, REG0(0), a1, a2); + tcg_out_ldst(s, I3312_STRH, REG0(0), a1, a2, 1); break; case INDEX_op_st_i32: case INDEX_op_st32_i64: - tcg_out_ldst(s, I3312_STRW, REG0(0), a1, a2); + tcg_out_ldst(s, I3312_STRW, REG0(0), a1, a2, 2); break; case INDEX_op_st_i64: - tcg_out_ldst(s, I3312_STRX, REG0(0), a1, a2); + tcg_out_ldst(s, I3312_STRX, REG0(0), a1, a2, 3); break; case INDEX_op_add_i32: @@ -1776,25 +2107,230 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, case INDEX_op_mov_i32: /* Always emitted via tcg_out_mov. */ case INDEX_op_mov_i64: + case INDEX_op_mov_vec: case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi. */ case INDEX_op_movi_i64: + case INDEX_op_dupi_vec: case INDEX_op_call: /* Always emitted via tcg_out_call. */ default: - tcg_abort(); + g_assert_not_reached(); } #undef REG0 } +static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, + unsigned vecl, unsigned vece, + const TCGArg *args, const int *const_args) +{ + static const AArch64Insn cmp_insn[16] = { + [TCG_COND_EQ] = I3616_CMEQ, + [TCG_COND_GT] = I3616_CMGT, + [TCG_COND_GE] = I3616_CMGE, + [TCG_COND_GTU] = I3616_CMHI, + [TCG_COND_GEU] = I3616_CMHS, + }; + static const AArch64Insn cmp0_insn[16] = { + [TCG_COND_EQ] = I3617_CMEQ0, + [TCG_COND_GT] = I3617_CMGT0, + [TCG_COND_GE] = I3617_CMGE0, + [TCG_COND_LT] = I3617_CMLT0, + [TCG_COND_LE] = I3617_CMLE0, + }; + + TCGType type = vecl + TCG_TYPE_V64; + unsigned is_q = vecl; + TCGArg a0, a1, a2; + + a0 = args[0]; + a1 = args[1]; + a2 = args[2]; + + switch (opc) { + case INDEX_op_movi_vec: + tcg_out_movi_vec(s, type, a0, args + 1); + break; + case INDEX_op_ld_vec: + tcg_out_ld(s, type, a0, a1, a2); + break; + case INDEX_op_st_vec: + tcg_out_st(s, type, a0, a1, a2); + break; + case INDEX_op_add_vec: + tcg_out_insn(s, 3616, ADD, is_q, vece, a0, a1, a2); + break; + case INDEX_op_sub_vec: + tcg_out_insn(s, 3616, SUB, is_q, vece, a0, a1, a2); + break; + case INDEX_op_mul_vec: + tcg_out_insn(s, 3616, MUL, is_q, vece, a0, a1, a2); + break; + case INDEX_op_neg_vec: + tcg_out_insn(s, 3617, NEG, is_q, vece, a0, a1); + break; + case INDEX_op_and_vec: + tcg_out_insn(s, 3616, AND, is_q, 0, a0, a1, a2); + break; + case INDEX_op_or_vec: + tcg_out_insn(s, 3616, ORR, is_q, 0, a0, a1, a2); + break; + case INDEX_op_xor_vec: + tcg_out_insn(s, 3616, EOR, is_q, 0, a0, a1, a2); + break; + case INDEX_op_andc_vec: + tcg_out_insn(s, 3616, BIC, is_q, 0, a0, a1, a2); + break; + case INDEX_op_orc_vec: + tcg_out_insn(s, 3616, ORN, is_q, 0, a0, a1, a2); + break; + case INDEX_op_not_vec: + tcg_out_insn(s, 3617, NOT, is_q, 0, a0, a1); + break; + case INDEX_op_dup_vec: + tcg_out_insn(s, 3605, DUP, is_q, a0, a1, 1 << vece, 0); + break; + case INDEX_op_shli_vec: + tcg_out_insn(s, 3614, SHL, is_q, a0, a1, a2 + (8 << vece)); + break; + case INDEX_op_shri_vec: + tcg_out_insn(s, 3614, USHR, is_q, a0, a1, (16 << vece) - a2); + break; + case INDEX_op_sari_vec: + tcg_out_insn(s, 3614, SSHR, is_q, a0, a1, (16 << vece) - a2); + break; + case INDEX_op_cmp_vec: + { + TCGCond cond = args[3]; + AArch64Insn insn; + + if (cond == TCG_COND_NE) { + if (const_args[2]) { + tcg_out_insn(s, 3616, CMTST, is_q, vece, a0, a1, a1); + } else { + tcg_out_insn(s, 3616, CMEQ, is_q, vece, a0, a1, a2); + tcg_out_insn(s, 3617, NOT, is_q, 0, a0, a0); + } + } else { + if (const_args[2]) { + insn = cmp0_insn[cond]; + if (insn) { + tcg_out_insn_3617(s, insn, is_q, vece, a0, a1); + break; + } + tcg_out_dupi_vec(s, type, TCG_VEC_TMP, 0); + a2 = TCG_VEC_TMP; + } + insn = cmp_insn[cond]; + if (insn == 0) { + TCGArg t; + t = a1, a1 = a2, a2 = t; + cond = tcg_swap_cond(cond); + insn = cmp_insn[cond]; + tcg_debug_assert(insn != 0); + } + tcg_out_insn_3616(s, insn, is_q, vece, a0, a1, a2); + } + } + break; + case INDEX_op_zipl_vec: + tcg_out_insn(s, 3603, ZIP1, is_q, vece, a0, a1, a2); + break; + case INDEX_op_ziph_vec: + tcg_out_insn(s, 3603, ZIP2, is_q, vece, a0, a1, a2); + break; + case INDEX_op_uzpe_vec: + tcg_out_insn(s, 3603, UZP1, is_q, vece, a0, a1, a2); + break; + case INDEX_op_uzpo_vec: + tcg_out_insn(s, 3603, UZP2, is_q, vece, a0, a1, a2); + break; + case INDEX_op_trne_vec: + tcg_out_insn(s, 3603, TRN1, is_q, vece, a0, a1, a2); + break; + case INDEX_op_trno_vec: + tcg_out_insn(s, 3603, TRN2, is_q, vece, a0, a1, a2); + break; + case INDEX_op_extul_vec: + tcg_out_insn(s, 3614, USHLL, 0, a0, a1, 0 + (8 << vece)); + break; + case INDEX_op_extuh_vec: + if (is_q) { + tcg_out_insn(s, 3614, USHLL, 1, a0, a1, 0 + (8 << vece)); + } else { + tcg_out_insn(s, 3614, USHLL, 0, a0, a1, 0 + (8 << vece)); + tcg_out_insn(s, 3605, INS, 0, a0, a0, 8, 16); + } + break; + case INDEX_op_extsl_vec: + tcg_out_insn(s, 3614, SSHLL, 0, a0, a1, 0 + (8 << vece)); + break; + case INDEX_op_extsh_vec: + if (is_q) { + tcg_out_insn(s, 3614, SSHLL, 1, a0, a1, 0 + (8 << vece)); + } else { + tcg_out_insn(s, 3614, SSHLL, 0, a0, a1, 0 + (8 << vece)); + tcg_out_insn(s, 3605, INS, 0, a0, a0, 8, 16); + } + break; + default: + g_assert_not_reached(); + } +} + +int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) +{ + switch (opc) { + case INDEX_op_add_vec: + case INDEX_op_sub_vec: + case INDEX_op_mul_vec: + case INDEX_op_and_vec: + case INDEX_op_or_vec: + case INDEX_op_xor_vec: + case INDEX_op_andc_vec: + case INDEX_op_orc_vec: + case INDEX_op_neg_vec: + case INDEX_op_not_vec: + case INDEX_op_cmp_vec: + case INDEX_op_zipl_vec: + case INDEX_op_ziph_vec: + case INDEX_op_uzpe_vec: + case INDEX_op_uzpo_vec: + case INDEX_op_trne_vec: + case INDEX_op_trno_vec: + case INDEX_op_shli_vec: + case INDEX_op_shri_vec: + case INDEX_op_sari_vec: + case INDEX_op_extul_vec: + case INDEX_op_extuh_vec: + case INDEX_op_extsl_vec: + case INDEX_op_extsh_vec: + return 1; + + default: + return 0; + } +} + +void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece, + TCGArg a0, ...) +{ +} + static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) { static const TCGTargetOpDef r = { .args_ct_str = { "r" } }; + static const TCGTargetOpDef w = { .args_ct_str = { "w" } }; static const TCGTargetOpDef r_r = { .args_ct_str = { "r", "r" } }; + static const TCGTargetOpDef w_w = { .args_ct_str = { "w", "w" } }; + static const TCGTargetOpDef w_r = { .args_ct_str = { "w", "r" } }; + static const TCGTargetOpDef w_wr = { .args_ct_str = { "w", "wr" } }; static const TCGTargetOpDef r_l = { .args_ct_str = { "r", "l" } }; static const TCGTargetOpDef r_rA = { .args_ct_str = { "r", "rA" } }; static const TCGTargetOpDef rZ_r = { .args_ct_str = { "rZ", "r" } }; static const TCGTargetOpDef lZ_l = { .args_ct_str = { "lZ", "l" } }; static const TCGTargetOpDef r_r_r = { .args_ct_str = { "r", "r", "r" } }; + static const TCGTargetOpDef w_w_w = { .args_ct_str = { "w", "w", "w" } }; + static const TCGTargetOpDef w_w_wZ = { .args_ct_str = { "w", "w", "wZ" } }; static const TCGTargetOpDef r_r_ri = { .args_ct_str = { "r", "r", "ri" } }; static const TCGTargetOpDef r_r_rA = { .args_ct_str = { "r", "r", "rA" } }; static const TCGTargetOpDef r_r_rL = { .args_ct_str = { "r", "r", "rL" } }; @@ -1938,6 +2474,41 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) case INDEX_op_sub2_i64: return &add2; + case INDEX_op_movi_vec: + return &w; + case INDEX_op_add_vec: + case INDEX_op_sub_vec: + case INDEX_op_mul_vec: + case INDEX_op_and_vec: + case INDEX_op_or_vec: + case INDEX_op_xor_vec: + case INDEX_op_andc_vec: + case INDEX_op_orc_vec: + case INDEX_op_zipl_vec: + case INDEX_op_ziph_vec: + case INDEX_op_uzpe_vec: + case INDEX_op_uzpo_vec: + case INDEX_op_trne_vec: + case INDEX_op_trno_vec: + return &w_w_w; + case INDEX_op_not_vec: + case INDEX_op_neg_vec: + case INDEX_op_shli_vec: + case INDEX_op_shri_vec: + case INDEX_op_sari_vec: + case INDEX_op_extul_vec: + case INDEX_op_extuh_vec: + case INDEX_op_extsl_vec: + case INDEX_op_extsh_vec: + return &w_w; + case INDEX_op_ld_vec: + case INDEX_op_st_vec: + return &w_r; + case INDEX_op_dup_vec: + return &w_wr; + case INDEX_op_cmp_vec: + return &w_w_wZ; + default: return NULL; } @@ -1947,8 +2518,10 @@ static void tcg_target_init(TCGContext *s) { tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffffu; tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffffu; + tcg_target_available_regs[TCG_TYPE_V64] = 0xffffffff00000000ull; + tcg_target_available_regs[TCG_TYPE_V128] = 0xffffffff00000000ull; - tcg_target_call_clobber_regs = 0xfffffffu; + tcg_target_call_clobber_regs = -1ull; tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_X19); tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_X20); tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_X21); @@ -1960,12 +2533,21 @@ static void tcg_target_init(TCGContext *s) tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_X27); tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_X28); tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_X29); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V8); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V9); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V10); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V11); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V12); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V13); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V14); + tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V15); s->reserved_regs = 0; tcg_regset_set_reg(s->reserved_regs, TCG_REG_SP); tcg_regset_set_reg(s->reserved_regs, TCG_REG_FP); tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP); tcg_regset_set_reg(s->reserved_regs, TCG_REG_X18); /* platform register */ + tcg_regset_set_reg(s->reserved_regs, TCG_VEC_TMP); } /* Saving pairs: (X19, X20) .. (X27, X28), (X29(fp), X30(lr)). */