From patchwork Wed Jun 27 04:33:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 140124 Delivered-To: patch@linaro.org Received: by 2002:a2e:9754:0:0:0:0:0 with SMTP id f20-v6csp405834ljj; Tue, 26 Jun 2018 22:07:32 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdYi8aDc0FcV25YYtl8CaAY0f5QSgTKZsTBCz5xd0Gmh/LJ7csaXWT334Nf3FM+brtkUcRV X-Received: by 2002:a37:2e06:: with SMTP id u6-v6mr3940301qkh.89.1530076052214; Tue, 26 Jun 2018 22:07:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530076052; cv=none; d=google.com; s=arc-20160816; b=gPXGfweSIfwoqko4afMx5f0FNnAio2xiDqzkyNt7mINkdyv+OL90HuxfyocbQ4UboA SDXa5Et8yudADdEr7TKKN2IQG1fQwZ9ZR6HKflE70Cp4AInSDB9ngbyrwroRixmInYdD 8lqNdRUM+FNpgBZt/6xcEI6V9FP6h41hkShrC83T3fwM1r5uJxbaTdu/QzlG2i6iWS0G SiI1HlG3BGMQ2C1bddIzGmkV8mcjIt8oz+GC6KWZ6/rBdmNHbuQkNvaKQSwHseOnTLct aflwBQi6z8yPEyhRqmfij9r594hUk8nntO016iaK+GZg/gJTz17Kb2kzGfxXahUfAyQX qgRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=wmvBZcu08nQNVj+xJr8ERwUWjrrrFWxOlfD78cTaOXQ=; b=iAqzsk8rwIvI5VFXh3LBkaVYx7McNiDUP5pWIg8pdwkXW1QR7h9gZD03wiql/KguX3 NsusZPt0p7qewnaxpTEZbnFD0ivKYlNodleDkcY4BDEKh+sL+oHRavkQQCM24mIiCVO3 mcJnKEKpsIhMmPy6bXGJEPSNUkw3rpcclZfXb/9Rc0LIcSONocsuJ8RrH7zyTPa3chCv 9f14XNCG2gYzmDV2ETYkSTU/90mRREYaczYuQLiOYsQSlhPjyLzjBgR3iCMb12ebPbtD XvXtsfUiQQX1Udb8ok76SvF6Y9rBhR1NURZ1KZK3u7j0juB0w08EQpQQCFtfoMaPj523 h2Rw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=CdrHTbeh; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id k6-v6si1832438qtm.367.2018.06.26.22.07.32 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 26 Jun 2018 22:07:32 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=CdrHTbeh; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:56702 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY2fz-0005ZU-NQ for patch@linaro.org; Wed, 27 Jun 2018 01:07:31 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:32909) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fY29q-0004fs-Ox for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:20 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fY29p-00017o-IB for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:18 -0400 Received: from mail-pg0-x235.google.com ([2607:f8b0:400e:c05::235]:45291) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fY29p-00016V-Ah for qemu-devel@nongnu.org; Wed, 27 Jun 2018 00:34:17 -0400 Received: by mail-pg0-x235.google.com with SMTP id z1-v6so357110pgv.12 for ; Tue, 26 Jun 2018 21:34:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=wmvBZcu08nQNVj+xJr8ERwUWjrrrFWxOlfD78cTaOXQ=; b=CdrHTbehnXlfkMpSCMvX5mI7eOPVh/Ne1bybOWPinnAwG9qawCjjwQ5Sys1rQ2ngsU i60S55DlULd3VQk/frnUKFyTkH7+kDByuVyQJc021x/b2Zj9mHb5c4FO69KTOqbZxJXq /Wj/hxAufaJIhavDR3YiXCCaIWPVN6rby7zhA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=wmvBZcu08nQNVj+xJr8ERwUWjrrrFWxOlfD78cTaOXQ=; b=PXD/Q4wfoa1Jll+dNg6PYHkB1XnbxxF248WAHPLSdpUYi7vqXgctaX48Yvy7BBKAEK GQ6rEhrrPkPJBIzJHB7mkoayjI3lEktMd2ew8TB2ei2rQU28DaJn1/CITSxeKfGMGUwD l6PDLinR23ZLfX0IKvT5MACD/OCgP1DbynRS5p3nob0kGRXs1aHTRkhmcxQOIoqEDn7j 2t4E9EPQw2uniuQiic8wJbXyKv4Gq/Wn4VTUoEP5Bd0oL3UZHBdlnmS+lo/A2Sj25lu6 WIAILMTTlXyNgYDFqxvKBUf1m+YAh5XtykayUt/HDZX6CbnRYBS9bTZr4XF7Ii9raLtK qScg== X-Gm-Message-State: APt69E2CyPcQokSIdIbR6SkJiRYEdOqL+IvxtdCkbrfKvQjgjCaVRRfK Nf21KH3lMkSLoDCNFDm3L3qpfgQvmQ8= X-Received: by 2002:a63:3c0c:: with SMTP id j12-v6mr685319pga.440.1530074056036; Tue, 26 Jun 2018 21:34:16 -0700 (PDT) Received: from cloudburst.twiddle.net (97-126-112-211.tukw.qwest.net. [97.126.112.211]) by smtp.gmail.com with ESMTPSA id p20-v6sm4577638pff.90.2018.06.26.21.34.14 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 21:34:15 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 26 Jun 2018 21:33:25 -0700 Message-Id: <20180627043328.11531-33-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180627043328.11531-1-richard.henderson@linaro.org> References: <20180627043328.11531-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c05::235 Subject: [Qemu-devel] [PATCH v6 32/35] target/arm: Implement SVE dot product (vectors) X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/helper.h | 5 +++ target/arm/translate-sve.c | 17 ++++++++++ target/arm/vec_helper.c | 67 ++++++++++++++++++++++++++++++++++++++ target/arm/sve.decode | 3 ++ 4 files changed, 92 insertions(+) -- 2.17.1 diff --git a/target/arm/helper.h b/target/arm/helper.h index 8607077dda..e23ce7ff19 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -583,6 +583,11 @@ DEF_HELPER_FLAGS_5(gvec_qrdmlah_s32, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(gvec_qrdmlsh_s32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sdot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_udot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_sdot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_udot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(gvec_fcaddh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_fcadds, TCG_CALL_NO_RWG, diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 4f2152fb70..8a2bd1f8c5 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3423,6 +3423,23 @@ DO_ZZI(UMIN, umin) #undef DO_ZZI +static bool trans_DOT_zzz(DisasContext *s, arg_DOT_zzz *a, uint32_t insn) +{ + static gen_helper_gvec_3 * const fns[2][2] = { + { gen_helper_gvec_sdot_b, gen_helper_gvec_sdot_h }, + { gen_helper_gvec_udot_b, gen_helper_gvec_udot_h } + }; + + if (sve_access_check(s)) { + unsigned vsz = vec_full_reg_size(s); + tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + vec_full_reg_offset(s, a->rm), + vsz, vsz, 0, fns[a->u][a->sz]); + } + return true; +} + /* *** SVE Floating Point Multiply-Add Indexed Group */ diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index db5aeb9f24..c16a30c3b5 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -194,6 +194,73 @@ void HELPER(gvec_qrdmlsh_s32)(void *vd, void *vn, void *vm, clear_tail(d, opr_sz, simd_maxsz(desc)); } +/* Integer 8 and 16-bit dot-product. + * + * Note that for the loops herein, host endianness does not matter + * with respect to the ordering of data within the 64-bit lanes. + * All elements are treated equally, no matter where they are. + */ + +void HELPER(gvec_sdot_b)(void *vd, void *vn, void *vm, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc); + uint32_t *d = vd; + int8_t *n = vn, *m = vm; + + for (i = 0; i < opr_sz / 4; ++i) { + d[i] += n[i * 4 + 0] * m[i * 4 + 0] + + n[i * 4 + 1] * m[i * 4 + 1] + + n[i * 4 + 2] * m[i * 4 + 2] + + n[i * 4 + 3] * m[i * 4 + 3]; + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +} + +void HELPER(gvec_udot_b)(void *vd, void *vn, void *vm, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc); + uint32_t *d = vd; + uint8_t *n = vn, *m = vm; + + for (i = 0; i < opr_sz / 4; ++i) { + d[i] += n[i * 4 + 0] * m[i * 4 + 0] + + n[i * 4 + 1] * m[i * 4 + 1] + + n[i * 4 + 2] * m[i * 4 + 2] + + n[i * 4 + 3] * m[i * 4 + 3]; + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +} + +void HELPER(gvec_sdot_h)(void *vd, void *vn, void *vm, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc); + uint64_t *d = vd; + int16_t *n = vn, *m = vm; + + for (i = 0; i < opr_sz / 8; ++i) { + d[i] += (int64_t)n[i * 4 + 0] * m[i * 4 + 0] + + (int64_t)n[i * 4 + 1] * m[i * 4 + 1] + + (int64_t)n[i * 4 + 2] * m[i * 4 + 2] + + (int64_t)n[i * 4 + 3] * m[i * 4 + 3]; + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +} + +void HELPER(gvec_udot_h)(void *vd, void *vn, void *vm, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc); + uint64_t *d = vd; + uint16_t *n = vn, *m = vm; + + for (i = 0; i < opr_sz / 8; ++i) { + d[i] += (uint64_t)n[i * 4 + 0] * m[i * 4 + 0] + + (uint64_t)n[i * 4 + 1] * m[i * 4 + 1] + + (uint64_t)n[i * 4 + 2] * m[i * 4 + 2] + + (uint64_t)n[i * 4 + 3] * m[i * 4 + 3]; + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +} + void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm, void *vfpst, uint32_t desc) { diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 62365ed90f..35415bfb6c 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -725,6 +725,9 @@ UMIN_zzi 00100101 .. 101 011 110 ........ ..... @rdn_i8u # SVE integer multiply immediate (unpredicated) MUL_zzi 00100101 .. 110 000 110 ........ ..... @rdn_i8s +# SVE integer dot product (unpredicated) +DOT_zzz 01000100 1 sz:1 0 rm:5 00000 u:1 rn:5 rd:5 + # SVE floating-point complex add (predicated) FCADD 01100100 esz:2 00000 rot:1 100 pg:3 rm:5 rd:5 \ rn=%reg_movprfx