From patchwork Fri Oct 5 17:53:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 148275 Delivered-To: patch@linaro.org Received: by 2002:a2e:8595:0:0:0:0:0 with SMTP id b21-v6csp748893lji; Fri, 5 Oct 2018 10:58:17 -0700 (PDT) X-Google-Smtp-Source: ACcGV61zdWOb73gaDMug2PgomI18rki3EQDMglKd+GYMUp9OmpIruV83U24yx6bNqP7zQ4hqbNSu X-Received: by 2002:a37:4647:: with SMTP id t68-v6mr10052234qka.260.1538762297217; Fri, 05 Oct 2018 10:58:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538762297; cv=none; d=google.com; s=arc-20160816; b=nWmiVSLOgeuD+p7b0r9GsaHYnBLcNYEuUKKaobbA0eDgJF7zj3uB+sQHycuQ4U6wk4 xtWXH5O2A3zCaWxwnARNevoWLcfVSgV0tlSiCPN+9r26Y//4/GwtLTkxHw3ereqqvA0I 8AhC5FUX/VnAsfJgc84QIKsQ8kAhsIVU53Z2n0X3aBohJC0COODndi96TF/DF1RNSTJF Y+tL9N7T8pxVGLzK8aDFOlnO2/zAOIz5VrVfT1F4T+OouvyUkAty41dexpkU+Cp2yf2N YETOMhR4+sc0md59f76SqYnDbY4/TXelZC6Z7Ui1cg/gEbcCVOmibzPTA+37bc+YhYwp 2jSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature; bh=w34AONdmRqwOXx9FeCuBLX1xr824iQTiCwBqIr/LEPo=; b=Fk8+T9GU+BolAnjTd/+4/8NJCvcJbNHzyQ+uKdOfWb2jjSttjX5wzl3/i3EDqaiJDF 3VZRfI1SkBw/MV+HCzKDaMobaVAK2Cfi4Wg5hXbZuC85y8b008XDqcWgV8R5DfSIiEVO arfVfuab/TRSRLbSOhhdv7rA431HKrm8D6Lxb8PyqOXXke5lTnbxxrpLCri4/IV/p89L Gk2X1k4HTy/FawTGS38MQ047I24bHruUiGdU3TeYKabU53Koq5egH36xOLYKXIDGS9bI eG8zk1F/pQ+MsNA17ziaDPspflJ6Kw6gGHV5ZDUgeKAsZz+Ctq9W27uN4NJrG4SSpOPo hGXQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=WtSvWmCB; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id h26-v6si5968831qkg.284.2018.10.05.10.58.16 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 05 Oct 2018 10:58:17 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=WtSvWmCB; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:36349 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g8UMi-0004sW-IH for patch@linaro.org; Fri, 05 Oct 2018 13:58:16 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35580) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1g8UIo-0001AJ-3M for qemu-devel@nongnu.org; Fri, 05 Oct 2018 13:54:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1g8UIm-0003Mo-5p for qemu-devel@nongnu.org; Fri, 05 Oct 2018 13:54:13 -0400 Received: from mail-oi1-x22a.google.com ([2607:f8b0:4864:20::22a]:39617) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1g8UIl-0003MC-UP for qemu-devel@nongnu.org; Fri, 05 Oct 2018 13:54:12 -0400 Received: by mail-oi1-x22a.google.com with SMTP id y81-v6so11086273oia.6 for ; Fri, 05 Oct 2018 10:54:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=w34AONdmRqwOXx9FeCuBLX1xr824iQTiCwBqIr/LEPo=; b=WtSvWmCB5YOAg66PjPpr7asTN7i9rC4jNK4vz4sn3xRh2T+79aRpLB+wfDp0qjwp1q qnPjZV9gFUPw49Ze+Rras1Tm454+MCiPpdIUzX1rcUBlgJgfrSZWQgucuozNM5g7vehi w6VnWhPLoMWpXl6Uq0/qVo3Jc1OJustf+xeVA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=w34AONdmRqwOXx9FeCuBLX1xr824iQTiCwBqIr/LEPo=; b=O/QVBKsNkf9fz/GmfT4rmqaILec1M4G8fxusKmyYqYCAYyALLtKeGwLEzJVMuKff2W eoSZS6t7jFJDQSv9JK8ZgxRnffQD2M9f4nwRUIwF7xuAiy7Nmz03tJG7UWIe8YMBqmvR DkZBmRpSPA/l+2GcV1hJPlktmOjZ7Y/xyyE7EnzjNamzwiXlIBUTwYZ811F2uBc75Dwi a8Pg6OpVPRjpezYQDrmItEXMXdlYHVQDYSeuNkub3mSaYfgxZJPgCU2V6chN1Mxm9f3a NxNqxy0oLkcDQAzCn44bq20tf4zD4jz8+y74KkomT/j0upm+utLUOEHk375wtTrJlqPS WByg== X-Gm-Message-State: ABuFfoingQxAmesDkPE2suj0Z7NE24sWMCs6HBws6FMYZKBhvywHvibR 3Vk2ktLSwp9m65joAUL2y00UJMwjgGMP+8rdIYg= X-Received: by 2002:aca:c244:: with SMTP id s65-v6mr7243oif.161.1538762050596; Fri, 05 Oct 2018 10:54:10 -0700 (PDT) Received: from cloudburst.twiddle.net ([187.217.230.84]) by smtp.gmail.com with ESMTPSA id q10-v6sm2672278otg.31.2018.10.05.10.54.09 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Oct 2018 10:54:09 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 5 Oct 2018 12:53:44 -0500 Message-Id: <20181005175350.30752-10-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181005175350.30752-1-richard.henderson@linaro.org> References: <20181005175350.30752-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::22a Subject: [Qemu-devel] [PATCH v3 09/15] target/arm: Rewrite helper_sve_st[1234]*_r X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" This fixes the endianness problem for softmmu, and moves the main loop out of a macro and into an inlined function. Reviewed-by: Peter Maydell Tested-by: Laurent Desnogues Signed-off-by: Richard Henderson --- target/arm/sve_helper.c | 351 ++++++++++++++++++++-------------------- 1 file changed, 172 insertions(+), 179 deletions(-) -- 2.17.1 diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index f712b382f8..0b1e06823b 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -3991,6 +3991,7 @@ typedef intptr_t sve_ld1_host_fn(void *vd, void *vg, void *host, */ typedef void sve_ld1_tlb_fn(CPUARMState *env, void *vd, intptr_t reg_off, target_ulong vaddr, int mmu_idx, uintptr_t ra); +typedef sve_ld1_tlb_fn sve_st1_tlb_fn; /* * Generate the above primitives. @@ -4668,214 +4669,206 @@ DO_LDFF1_LDNF1_2(dd, 3, 3) /* * Store contiguous data, protected by a governing predicate. */ -#define DO_ST1(NAME, FN, TYPEE, TYPEM, H) \ -void HELPER(NAME)(CPUARMState *env, void *vg, \ - target_ulong addr, uint32_t desc) \ -{ \ - intptr_t i, oprsz = simd_oprsz(desc); \ - intptr_t ra = GETPC(); \ - unsigned rd = simd_data(desc); \ - void *vd = &env->vfp.zregs[rd]; \ - for (i = 0; i < oprsz; ) { \ - uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ - do { \ - if (pg & 1) { \ - TYPEM m = *(TYPEE *)(vd + H(i)); \ - FN(env, addr, m, ra); \ - } \ - i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \ - addr += sizeof(TYPEM); \ - } while (i & 15); \ - } \ + +#ifdef CONFIG_SOFTMMU +#define DO_ST_TLB(NAME, H, TYPEM, HOST, MOEND, TLB) \ +static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off, \ + target_ulong addr, int mmu_idx, uintptr_t ra) \ +{ \ + TCGMemOpIdx oi = make_memop_idx(ctz32(sizeof(TYPEM)) | MOEND, mmu_idx); \ + TLB(env, addr, *(TYPEM *)(vd + H(reg_off)), oi, ra); \ } - -#define DO_ST1_D(NAME, FN, TYPEM) \ -void HELPER(NAME)(CPUARMState *env, void *vg, \ - target_ulong addr, uint32_t desc) \ -{ \ - intptr_t i, oprsz = simd_oprsz(desc) / 8; \ - intptr_t ra = GETPC(); \ - unsigned rd = simd_data(desc); \ - uint64_t *d = &env->vfp.zregs[rd].d[0]; \ - uint8_t *pg = vg; \ - for (i = 0; i < oprsz; i += 1) { \ - if (pg[H1(i)] & 1) { \ - FN(env, addr, d[i], ra); \ - } \ - addr += sizeof(TYPEM); \ - } \ +#else +#define DO_ST_TLB(NAME, H, TYPEM, HOST, MOEND, TLB) \ +static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off, \ + target_ulong addr, int mmu_idx, uintptr_t ra) \ +{ \ + HOST(g2h(addr), *(TYPEM *)(vd + H(reg_off))); \ } +#endif -#define DO_ST2(NAME, FN, TYPEE, TYPEM, H) \ -void HELPER(NAME)(CPUARMState *env, void *vg, \ - target_ulong addr, uint32_t desc) \ -{ \ - intptr_t i, oprsz = simd_oprsz(desc); \ - intptr_t ra = GETPC(); \ - unsigned rd = simd_data(desc); \ - void *d1 = &env->vfp.zregs[rd]; \ - void *d2 = &env->vfp.zregs[(rd + 1) & 31]; \ - for (i = 0; i < oprsz; ) { \ - uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ - do { \ - if (pg & 1) { \ - TYPEM m1 = *(TYPEE *)(d1 + H(i)); \ - TYPEM m2 = *(TYPEE *)(d2 + H(i)); \ - FN(env, addr, m1, ra); \ - FN(env, addr + sizeof(TYPEM), m2, ra); \ - } \ - i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \ - addr += 2 * sizeof(TYPEM); \ - } while (i & 15); \ - } \ -} +DO_ST_TLB(st1bb, H1, uint8_t, stb_p, 0, helper_ret_stb_mmu) +DO_ST_TLB(st1bh, H1_2, uint16_t, stb_p, 0, helper_ret_stb_mmu) +DO_ST_TLB(st1bs, H1_4, uint32_t, stb_p, 0, helper_ret_stb_mmu) +DO_ST_TLB(st1bd, , uint64_t, stb_p, 0, helper_ret_stb_mmu) -#define DO_ST3(NAME, FN, TYPEE, TYPEM, H) \ -void HELPER(NAME)(CPUARMState *env, void *vg, \ - target_ulong addr, uint32_t desc) \ -{ \ - intptr_t i, oprsz = simd_oprsz(desc); \ - intptr_t ra = GETPC(); \ - unsigned rd = simd_data(desc); \ - void *d1 = &env->vfp.zregs[rd]; \ - void *d2 = &env->vfp.zregs[(rd + 1) & 31]; \ - void *d3 = &env->vfp.zregs[(rd + 2) & 31]; \ - for (i = 0; i < oprsz; ) { \ - uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ - do { \ - if (pg & 1) { \ - TYPEM m1 = *(TYPEE *)(d1 + H(i)); \ - TYPEM m2 = *(TYPEE *)(d2 + H(i)); \ - TYPEM m3 = *(TYPEE *)(d3 + H(i)); \ - FN(env, addr, m1, ra); \ - FN(env, addr + sizeof(TYPEM), m2, ra); \ - FN(env, addr + 2 * sizeof(TYPEM), m3, ra); \ - } \ - i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \ - addr += 3 * sizeof(TYPEM); \ - } while (i & 15); \ - } \ -} +DO_ST_TLB(st1hh_le, H1_2, uint16_t, stw_le_p, MO_LE, helper_le_stw_mmu) +DO_ST_TLB(st1hs_le, H1_4, uint32_t, stw_le_p, MO_LE, helper_le_stw_mmu) +DO_ST_TLB(st1hd_le, , uint64_t, stw_le_p, MO_LE, helper_le_stw_mmu) -#define DO_ST4(NAME, FN, TYPEE, TYPEM, H) \ -void HELPER(NAME)(CPUARMState *env, void *vg, \ - target_ulong addr, uint32_t desc) \ -{ \ - intptr_t i, oprsz = simd_oprsz(desc); \ - intptr_t ra = GETPC(); \ - unsigned rd = simd_data(desc); \ - void *d1 = &env->vfp.zregs[rd]; \ - void *d2 = &env->vfp.zregs[(rd + 1) & 31]; \ - void *d3 = &env->vfp.zregs[(rd + 2) & 31]; \ - void *d4 = &env->vfp.zregs[(rd + 3) & 31]; \ - for (i = 0; i < oprsz; ) { \ - uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ - do { \ - if (pg & 1) { \ - TYPEM m1 = *(TYPEE *)(d1 + H(i)); \ - TYPEM m2 = *(TYPEE *)(d2 + H(i)); \ - TYPEM m3 = *(TYPEE *)(d3 + H(i)); \ - TYPEM m4 = *(TYPEE *)(d4 + H(i)); \ - FN(env, addr, m1, ra); \ - FN(env, addr + sizeof(TYPEM), m2, ra); \ - FN(env, addr + 2 * sizeof(TYPEM), m3, ra); \ - FN(env, addr + 3 * sizeof(TYPEM), m4, ra); \ - } \ - i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \ - addr += 4 * sizeof(TYPEM); \ - } while (i & 15); \ - } \ -} +DO_ST_TLB(st1ss_le, H1_4, uint32_t, stl_le_p, MO_LE, helper_le_stl_mmu) +DO_ST_TLB(st1sd_le, , uint64_t, stl_le_p, MO_LE, helper_le_stl_mmu) -DO_ST1(sve_st1bh_r, cpu_stb_data_ra, uint16_t, uint8_t, H1_2) -DO_ST1(sve_st1bs_r, cpu_stb_data_ra, uint32_t, uint8_t, H1_4) -DO_ST1_D(sve_st1bd_r, cpu_stb_data_ra, uint8_t) +DO_ST_TLB(st1dd_le, , uint64_t, stq_le_p, MO_LE, helper_le_stq_mmu) -DO_ST1(sve_st1hs_r, cpu_stw_data_ra, uint32_t, uint16_t, H1_4) -DO_ST1_D(sve_st1hd_r, cpu_stw_data_ra, uint16_t) +DO_ST_TLB(st1hh_be, H1_2, uint16_t, stw_be_p, MO_BE, helper_be_stw_mmu) +DO_ST_TLB(st1hs_be, H1_4, uint32_t, stw_be_p, MO_BE, helper_be_stw_mmu) +DO_ST_TLB(st1hd_be, , uint64_t, stw_be_p, MO_BE, helper_be_stw_mmu) -DO_ST1_D(sve_st1sd_r, cpu_stl_data_ra, uint32_t) +DO_ST_TLB(st1ss_be, H1_4, uint32_t, stl_be_p, MO_BE, helper_be_stl_mmu) +DO_ST_TLB(st1sd_be, , uint64_t, stl_be_p, MO_BE, helper_be_stl_mmu) -DO_ST1(sve_st1bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1) -DO_ST2(sve_st2bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1) -DO_ST3(sve_st3bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1) -DO_ST4(sve_st4bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1) +DO_ST_TLB(st1dd_be, , uint64_t, stq_be_p, MO_BE, helper_be_stq_mmu) -DO_ST1(sve_st1hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2) -DO_ST2(sve_st2hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2) -DO_ST3(sve_st3hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2) -DO_ST4(sve_st4hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2) +#undef DO_ST_TLB -DO_ST1(sve_st1ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4) -DO_ST2(sve_st2ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4) -DO_ST3(sve_st3ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4) -DO_ST4(sve_st4ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4) - -DO_ST1_D(sve_st1dd_r, cpu_stq_data_ra, uint64_t) - -void HELPER(sve_st2dd_r)(CPUARMState *env, void *vg, - target_ulong addr, uint32_t desc) +/* + * Common helpers for all contiguous 1,2,3,4-register predicated stores. + */ +static void sve_st1_r(CPUARMState *env, void *vg, target_ulong addr, + uint32_t desc, const uintptr_t ra, + const int esize, const int msize, + sve_st1_tlb_fn *tlb_fn) { - intptr_t i, oprsz = simd_oprsz(desc) / 8; - intptr_t ra = GETPC(); + const int mmu_idx = cpu_mmu_index(env, false); + intptr_t i, oprsz = simd_oprsz(desc); unsigned rd = simd_data(desc); - uint64_t *d1 = &env->vfp.zregs[rd].d[0]; - uint64_t *d2 = &env->vfp.zregs[(rd + 1) & 31].d[0]; - uint8_t *pg = vg; + void *vd = &env->vfp.zregs[rd]; - for (i = 0; i < oprsz; i += 1) { - if (pg[H1(i)] & 1) { - cpu_stq_data_ra(env, addr, d1[i], ra); - cpu_stq_data_ra(env, addr + 8, d2[i], ra); - } - addr += 2 * 8; + set_helper_retaddr(ra); + for (i = 0; i < oprsz; ) { + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); + do { + if (pg & 1) { + tlb_fn(env, vd, i, addr, mmu_idx, ra); + } + i += esize, pg >>= esize; + addr += msize; + } while (i & 15); } + set_helper_retaddr(0); } -void HELPER(sve_st3dd_r)(CPUARMState *env, void *vg, - target_ulong addr, uint32_t desc) +static void sve_st2_r(CPUARMState *env, void *vg, target_ulong addr, + uint32_t desc, const uintptr_t ra, + const int esize, const int msize, + sve_st1_tlb_fn *tlb_fn) { - intptr_t i, oprsz = simd_oprsz(desc) / 8; - intptr_t ra = GETPC(); + const int mmu_idx = cpu_mmu_index(env, false); + intptr_t i, oprsz = simd_oprsz(desc); unsigned rd = simd_data(desc); - uint64_t *d1 = &env->vfp.zregs[rd].d[0]; - uint64_t *d2 = &env->vfp.zregs[(rd + 1) & 31].d[0]; - uint64_t *d3 = &env->vfp.zregs[(rd + 2) & 31].d[0]; - uint8_t *pg = vg; + void *d1 = &env->vfp.zregs[rd]; + void *d2 = &env->vfp.zregs[(rd + 1) & 31]; - for (i = 0; i < oprsz; i += 1) { - if (pg[H1(i)] & 1) { - cpu_stq_data_ra(env, addr, d1[i], ra); - cpu_stq_data_ra(env, addr + 8, d2[i], ra); - cpu_stq_data_ra(env, addr + 16, d3[i], ra); - } - addr += 3 * 8; + set_helper_retaddr(ra); + for (i = 0; i < oprsz; ) { + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); + do { + if (pg & 1) { + tlb_fn(env, d1, i, addr, mmu_idx, ra); + tlb_fn(env, d2, i, addr + msize, mmu_idx, ra); + } + i += esize, pg >>= esize; + addr += 2 * msize; + } while (i & 15); } + set_helper_retaddr(0); } -void HELPER(sve_st4dd_r)(CPUARMState *env, void *vg, - target_ulong addr, uint32_t desc) +static void sve_st3_r(CPUARMState *env, void *vg, target_ulong addr, + uint32_t desc, const uintptr_t ra, + const int esize, const int msize, + sve_st1_tlb_fn *tlb_fn) { - intptr_t i, oprsz = simd_oprsz(desc) / 8; - intptr_t ra = GETPC(); + const int mmu_idx = cpu_mmu_index(env, false); + intptr_t i, oprsz = simd_oprsz(desc); unsigned rd = simd_data(desc); - uint64_t *d1 = &env->vfp.zregs[rd].d[0]; - uint64_t *d2 = &env->vfp.zregs[(rd + 1) & 31].d[0]; - uint64_t *d3 = &env->vfp.zregs[(rd + 2) & 31].d[0]; - uint64_t *d4 = &env->vfp.zregs[(rd + 3) & 31].d[0]; - uint8_t *pg = vg; + void *d1 = &env->vfp.zregs[rd]; + void *d2 = &env->vfp.zregs[(rd + 1) & 31]; + void *d3 = &env->vfp.zregs[(rd + 2) & 31]; - for (i = 0; i < oprsz; i += 1) { - if (pg[H1(i)] & 1) { - cpu_stq_data_ra(env, addr, d1[i], ra); - cpu_stq_data_ra(env, addr + 8, d2[i], ra); - cpu_stq_data_ra(env, addr + 16, d3[i], ra); - cpu_stq_data_ra(env, addr + 24, d4[i], ra); - } - addr += 4 * 8; + set_helper_retaddr(ra); + for (i = 0; i < oprsz; ) { + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); + do { + if (pg & 1) { + tlb_fn(env, d1, i, addr, mmu_idx, ra); + tlb_fn(env, d2, i, addr + msize, mmu_idx, ra); + tlb_fn(env, d3, i, addr + 2 * msize, mmu_idx, ra); + } + i += esize, pg >>= esize; + addr += 3 * msize; + } while (i & 15); } + set_helper_retaddr(0); } +static void sve_st4_r(CPUARMState *env, void *vg, target_ulong addr, + uint32_t desc, const uintptr_t ra, + const int esize, const int msize, + sve_st1_tlb_fn *tlb_fn) +{ + const int mmu_idx = cpu_mmu_index(env, false); + intptr_t i, oprsz = simd_oprsz(desc); + unsigned rd = simd_data(desc); + void *d1 = &env->vfp.zregs[rd]; + void *d2 = &env->vfp.zregs[(rd + 1) & 31]; + void *d3 = &env->vfp.zregs[(rd + 2) & 31]; + void *d4 = &env->vfp.zregs[(rd + 3) & 31]; + + set_helper_retaddr(ra); + for (i = 0; i < oprsz; ) { + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); + do { + if (pg & 1) { + tlb_fn(env, d1, i, addr, mmu_idx, ra); + tlb_fn(env, d2, i, addr + msize, mmu_idx, ra); + tlb_fn(env, d3, i, addr + 2 * msize, mmu_idx, ra); + tlb_fn(env, d4, i, addr + 3 * msize, mmu_idx, ra); + } + i += esize, pg >>= esize; + addr += 4 * msize; + } while (i & 15); + } + set_helper_retaddr(0); +} + +#define DO_STN_1(N, NAME, ESIZE) \ +void __attribute__((flatten)) HELPER(sve_st##N##NAME##_r) \ + (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc) \ +{ \ + sve_st##N##_r(env, vg, addr, desc, GETPC(), ESIZE, 1, \ + sve_st1##NAME##_tlb); \ +} + +#define DO_STN_2(N, NAME, ESIZE, MSIZE) \ +void __attribute__((flatten)) HELPER(sve_st##N##NAME##_r) \ + (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc) \ +{ \ + sve_st##N##_r(env, vg, addr, desc, GETPC(), ESIZE, MSIZE, \ + arm_cpu_data_is_big_endian(env) \ + ? sve_st1##NAME##_be_tlb : sve_st1##NAME##_le_tlb); \ +} + +DO_STN_1(1, bb, 1) +DO_STN_1(1, bh, 2) +DO_STN_1(1, bs, 4) +DO_STN_1(1, bd, 8) +DO_STN_1(2, bb, 1) +DO_STN_1(3, bb, 1) +DO_STN_1(4, bb, 1) + +DO_STN_2(1, hh, 2, 2) +DO_STN_2(1, hs, 4, 2) +DO_STN_2(1, hd, 8, 2) +DO_STN_2(2, hh, 2, 2) +DO_STN_2(3, hh, 2, 2) +DO_STN_2(4, hh, 2, 2) + +DO_STN_2(1, ss, 4, 4) +DO_STN_2(1, sd, 8, 4) +DO_STN_2(2, ss, 4, 4) +DO_STN_2(3, ss, 4, 4) +DO_STN_2(4, ss, 4, 4) + +DO_STN_2(1, dd, 8, 8) +DO_STN_2(2, dd, 8, 8) +DO_STN_2(3, dd, 8, 8) +DO_STN_2(4, dd, 8, 8) + +#undef DO_STN_1 +#undef DO_STN_2 + /* Loads with a vector index. */ #define DO_LD1_ZPZ_S(NAME, TYPEI, TYPEM, FN) \