From patchwork Thu Oct 11 20:52:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 148677 Delivered-To: patch@linaro.org Received: by 2002:a2e:8595:0:0:0:0:0 with SMTP id b21-v6csp2599445lji; Thu, 11 Oct 2018 13:58:46 -0700 (PDT) X-Google-Smtp-Source: ACcGV63mnlJfIuphuH6XnMg/qeD0z14QB7tOLH62YRuPbL8S5oIwleY1LCphGki++IMQ5AZSOLU/ X-Received: by 2002:a37:b504:: with SMTP id e4-v6mr2990509qkf.255.1539291526130; Thu, 11 Oct 2018 13:58:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539291526; cv=none; d=google.com; s=arc-20160816; b=IEJVafFd2JUVYn4/2wvIW6bKQvEocdHa/u9z4ZUOSLejAlEZPLfxNMAg7BDkyv5TOI AHDOoLhDhJl2fEOWSvVgeEju/JG4F5sY9mD/DpCwo4UuxoqEJVA7TOo7CowW66yBa4+e fcdx9n4z9I4BLDxmyy97jsdyq0p8lbeTk+o8FnA5H8rOVW2rhDF1cYXGjE5QkAdD8K50 fTtd2TgvFoVE1qcBjyHzQx3m2uhe0J0gzmFzIOMv0RAO98+TRfyADs+HAdh0BBT7amwp 2h4nOp9zQCNkQRWJhWsd08YQtZLoNbgOq7EBPawUMtvwQuWmE+QZGxnaSnfAM29R+KQN yfKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature; bh=zdE4q8LpLXaAekQeUtxbcMeL3xVr4K+/hq+3vvQudb4=; b=xj8r8z+dEwwo8b11fq4pW66hsr8S7mhfYCksggP3evBmaj9LKDRUoWCGsbzgVfwqxy 25/NFUW0lpWXbQbFwFY66UNVPllevcGDu/TyRhxbJ2r1BuUoIjMsBypBM7XBpUAYnFq0 E4GJmjbzJY/PHsqY3r++eq4smoNRx27M2OsOqp43Li/bU7K27KMAZYUxgk0ITGM8u834 /ZEgGLZzNI/ULiaX+aoK24gNdUYAkjnc8lQqQkuVxt6gRIpuvLJ6EN+hE2Co4mxBmxZb wjo886y/knf+mjHebUJgcCCqfKo9frlef+55fN5y1hJU8XuBVYS/6C0d9vBCY0CunG28 Rvxw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=HTk0aGWq; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id k14si2172278qvp.85.2018.10.11.13.58.45 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 11 Oct 2018 13:58:46 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=HTk0aGWq; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:37096 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gAi2f-00032t-H7 for patch@linaro.org; Thu, 11 Oct 2018 16:58:45 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45122) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gAhwn-0006pr-Lq for qemu-devel@nongnu.org; Thu, 11 Oct 2018 16:52:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gAhwl-0005ux-Cq for qemu-devel@nongnu.org; Thu, 11 Oct 2018 16:52:41 -0400 Received: from mail-pf1-x435.google.com ([2607:f8b0:4864:20::435]:36424) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gAhwk-0005d6-Su for qemu-devel@nongnu.org; Thu, 11 Oct 2018 16:52:39 -0400 Received: by mail-pf1-x435.google.com with SMTP id l81-v6so5011983pfg.3 for ; Thu, 11 Oct 2018 13:52:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=zdE4q8LpLXaAekQeUtxbcMeL3xVr4K+/hq+3vvQudb4=; b=HTk0aGWqHEb2Ayv1X7VXqC4Ag0SmQePwyiWe/bPMe4OVudQ6us0c1HIV/qiXzaYwNV 1UT6cqkEajmhz6ScI8Aez4EoVZzm7Bq1ofubfAp2sf5POhmMu1xdQBXLpsmA4JngPK8s CYy0U7IMDYTOOd7tvXN7ncJaki31mBxzdMH5g= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=zdE4q8LpLXaAekQeUtxbcMeL3xVr4K+/hq+3vvQudb4=; b=cY164ykhWXtNli6egroK6lzT/ENU0/LF4sExNURXfodbnTJkIz7VcyX/Po49m3JHwZ XW9oC7g70Iqb+/eJmdehoM9gDCv2vwPbE3GbsBX9xxWB2NFsMtO/5/Qk9BN3y3TcpH/T e32lWzxUICAYk0op839O/9koXAwWBvH+2OOjsf+Pigte3MGuJpxMCSSQcUEHb/B6cIVX YUx+CJnggWTjLPGl1bH5dDbD7BG2BEBaeEJYmZMr1FAwtwrKWcMEBwRiiNv8t2r6EK7o QgGrvztuXNcrho1G8u70u9gOJ4QTaUzjskiGJw+S3MnWJb28IuKpYQAf2wZVt01XAvV1 Au6Q== X-Gm-Message-State: ABuFfogCMXn3H6IhhKoa6qIHghPL5BJWZEwRWLsvjz2r1EuLFgDO3Eq+ +ag+G5DeebbSw/s3A1GncLgdsjSlWl8= X-Received: by 2002:a62:9850:: with SMTP id q77-v6mr3036346pfd.249.1539291155509; Thu, 11 Oct 2018 13:52:35 -0700 (PDT) Received: from cloudburst.twiddle.net (97-113-8-179.tukw.qwest.net. [97.113.8.179]) by smtp.gmail.com with ESMTPSA id h87-v6sm34707866pfj.78.2018.10.11.13.52.34 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 11 Oct 2018 13:52:34 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Thu, 11 Oct 2018 13:52:04 -0700 Message-Id: <20181011205206.3552-19-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181011205206.3552-1-richard.henderson@linaro.org> References: <20181011205206.3552-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::435 Subject: [Qemu-devel] [PATCH 18/20] target/arm: Reorg NEON VLD/VST all elements X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Instead of shifts and masks, use direct loads and stores from the neon register file. Mirror the iteration structure of the ARM pseudocode more closely. Correct the parameters of the VLD2 A2 insn. Signed-off-by: Richard Henderson --- target/arm/translate.c | 170 ++++++++++++++++++----------------------- 1 file changed, 74 insertions(+), 96 deletions(-) -- 2.17.1 diff --git a/target/arm/translate.c b/target/arm/translate.c index 1e79a1eec0..12a744b3c3 100644 --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -1611,12 +1611,56 @@ static TCGv_i32 neon_load_reg(int reg, int pass) return tmp; } +static void neon_load_element64(TCGv_i64 var, int reg, int ele, TCGMemOp mop) +{ + long offset = neon_element_offset(reg, ele, mop & MO_SIZE); + + switch (mop) { + case MO_UB: + tcg_gen_ld8u_i64(var, cpu_env, offset); + break; + case MO_UW: + tcg_gen_ld16u_i64(var, cpu_env, offset); + break; + case MO_UL: + tcg_gen_ld32u_i64(var, cpu_env, offset); + break; + case MO_Q: + tcg_gen_ld_i64(var, cpu_env, offset); + break; + default: + g_assert_not_reached(); + } +} + static void neon_store_reg(int reg, int pass, TCGv_i32 var) { tcg_gen_st_i32(var, cpu_env, neon_reg_offset(reg, pass)); tcg_temp_free_i32(var); } +static void neon_store_element64(int reg, int ele, TCGMemOp size, TCGv_i64 var) +{ + long offset = neon_element_offset(reg, ele, size); + + switch (size) { + case MO_8: + tcg_gen_st8_i64(var, cpu_env, offset); + break; + case MO_16: + tcg_gen_st16_i64(var, cpu_env, offset); + break; + case MO_32: + tcg_gen_st32_i64(var, cpu_env, offset); + break; + case MO_64: + tcg_gen_st_i64(var, cpu_env, offset); + break; + default: + g_assert_not_reached(); + } +} + static inline void neon_load_reg64(TCGv_i64 var, int reg) { tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg)); @@ -4885,16 +4929,16 @@ static struct { int interleave; int spacing; } const neon_ls_element_type[11] = { - {4, 4, 1}, - {4, 4, 2}, + {1, 4, 1}, + {1, 4, 2}, {4, 1, 1}, - {4, 2, 1}, - {3, 3, 1}, - {3, 3, 2}, + {2, 2, 2}, + {1, 3, 1}, + {1, 3, 2}, {3, 1, 1}, {1, 1, 1}, - {2, 2, 1}, - {2, 2, 2}, + {1, 2, 1}, + {1, 2, 2}, {2, 1, 1} }; @@ -4915,6 +4959,8 @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn) int shift; int n; int vec_size; + int mmu_idx; + TCGMemOp endian; TCGv_i32 addr; TCGv_i32 tmp; TCGv_i32 tmp2; @@ -4936,6 +4982,8 @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn) rn = (insn >> 16) & 0xf; rm = insn & 0xf; load = (insn & (1 << 21)) != 0; + endian = s->be_data; + mmu_idx = get_mem_index(s); if ((insn & (1 << 23)) == 0) { /* Load store all elements. */ op = (insn >> 8) & 0xf; @@ -4960,104 +5008,34 @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn) nregs = neon_ls_element_type[op].nregs; interleave = neon_ls_element_type[op].interleave; spacing = neon_ls_element_type[op].spacing; - if (size == 3 && (interleave | spacing) != 1) + if (size == 3 && (interleave | spacing) != 1) { return 1; + } + tmp64 = tcg_temp_new_i64(); addr = tcg_temp_new_i32(); + tmp2 = tcg_const_i32(1 << size); load_reg_var(s, addr, rn); - stride = (1 << size) * interleave; for (reg = 0; reg < nregs; reg++) { - if (interleave > 2 || (interleave == 2 && nregs == 2)) { - load_reg_var(s, addr, rn); - tcg_gen_addi_i32(addr, addr, (1 << size) * reg); - } else if (interleave == 2 && nregs == 4 && reg == 2) { - load_reg_var(s, addr, rn); - tcg_gen_addi_i32(addr, addr, 1 << size); - } - if (size == 3) { - tmp64 = tcg_temp_new_i64(); - if (load) { - gen_aa32_ld64(s, tmp64, addr, get_mem_index(s)); - neon_store_reg64(tmp64, rd); - } else { - neon_load_reg64(tmp64, rd); - gen_aa32_st64(s, tmp64, addr, get_mem_index(s)); - } - tcg_temp_free_i64(tmp64); - tcg_gen_addi_i32(addr, addr, stride); - } else { - for (pass = 0; pass < 2; pass++) { - if (size == 2) { - if (load) { - tmp = tcg_temp_new_i32(); - gen_aa32_ld32u(s, tmp, addr, get_mem_index(s)); - neon_store_reg(rd, pass, tmp); - } else { - tmp = neon_load_reg(rd, pass); - gen_aa32_st32(s, tmp, addr, get_mem_index(s)); - tcg_temp_free_i32(tmp); - } - tcg_gen_addi_i32(addr, addr, stride); - } else if (size == 1) { - if (load) { - tmp = tcg_temp_new_i32(); - gen_aa32_ld16u(s, tmp, addr, get_mem_index(s)); - tcg_gen_addi_i32(addr, addr, stride); - tmp2 = tcg_temp_new_i32(); - gen_aa32_ld16u(s, tmp2, addr, get_mem_index(s)); - tcg_gen_addi_i32(addr, addr, stride); - tcg_gen_shli_i32(tmp2, tmp2, 16); - tcg_gen_or_i32(tmp, tmp, tmp2); - tcg_temp_free_i32(tmp2); - neon_store_reg(rd, pass, tmp); - } else { - tmp = neon_load_reg(rd, pass); - tmp2 = tcg_temp_new_i32(); - tcg_gen_shri_i32(tmp2, tmp, 16); - gen_aa32_st16(s, tmp, addr, get_mem_index(s)); - tcg_temp_free_i32(tmp); - tcg_gen_addi_i32(addr, addr, stride); - gen_aa32_st16(s, tmp2, addr, get_mem_index(s)); - tcg_temp_free_i32(tmp2); - tcg_gen_addi_i32(addr, addr, stride); - } - } else /* size == 0 */ { - if (load) { - tmp2 = NULL; - for (n = 0; n < 4; n++) { - tmp = tcg_temp_new_i32(); - gen_aa32_ld8u(s, tmp, addr, get_mem_index(s)); - tcg_gen_addi_i32(addr, addr, stride); - if (n == 0) { - tmp2 = tmp; - } else { - tcg_gen_shli_i32(tmp, tmp, n * 8); - tcg_gen_or_i32(tmp2, tmp2, tmp); - tcg_temp_free_i32(tmp); - } - } - neon_store_reg(rd, pass, tmp2); - } else { - tmp2 = neon_load_reg(rd, pass); - for (n = 0; n < 4; n++) { - tmp = tcg_temp_new_i32(); - if (n == 0) { - tcg_gen_mov_i32(tmp, tmp2); - } else { - tcg_gen_shri_i32(tmp, tmp2, n * 8); - } - gen_aa32_st8(s, tmp, addr, get_mem_index(s)); - tcg_temp_free_i32(tmp); - tcg_gen_addi_i32(addr, addr, stride); - } - tcg_temp_free_i32(tmp2); - } + for (n = 0; n < 8 >> size; n++) { + int xs; + for (xs = 0; xs < interleave; xs++) { + int tt = rd + reg + spacing * xs; + + if (load) { + gen_aa32_ld_i64(s, tmp64, addr, mmu_idx, endian | size); + neon_store_element64(tt, n, size, tmp64); + } else { + neon_load_element64(tmp64, tt, n, size); + gen_aa32_st_i64(s, tmp64, addr, mmu_idx, endian | size); } + tcg_gen_add_i32(addr, addr, tmp2); } } - rd += spacing; } tcg_temp_free_i32(addr); - stride = nregs * 8; + tcg_temp_free_i32(tmp2); + tcg_temp_free_i64(tmp64); + stride = nregs * interleave * 8; } else { size = (insn >> 10) & 3; if (size == 3) {