From patchwork Thu Aug 9 04:21:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 143663 Delivered-To: patch@linaro.org Received: by 2002:a2e:9754:0:0:0:0:0 with SMTP id f20-v6csp1605945ljj; Wed, 8 Aug 2018 21:28:21 -0700 (PDT) X-Google-Smtp-Source: AA+uWPwl9odD60WV6ePuGl2XmfKWWhrK4w1FPzSWzJnAzo6Xwv9p7KF5NJrCVPXIu3mvRMEoRA8n X-Received: by 2002:a0c:f611:: with SMTP id r17-v6mr415109qvm.103.1533788900977; Wed, 08 Aug 2018 21:28:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533788900; cv=none; d=google.com; s=arc-20160816; b=gbXF0H84xYEatEdy4La3wFyllPQrIUChCBtFFcumnvKkleGOXod149cax5pI0zWqhb ey5SEhCuPdCS3r6aGiQleYbb79qDrSC/m0o14WRm7078AyDbxXkR0SatT7HvYHbxi/kn GedtJLJH1I2wXuGUMYuqppxyi6YEZFf/G352a0aekfiyy+/3/gjTTNOBFtRX+9dGu4VH VyLx8cB5ubHrmrAHo8ZFHXavDiCa8kO8iZXkGFKNpbrUPM9Id+AqLVbmj4s1hrLKJCoU 4KriRgDFifEAtx9c05OUWOXOX2InXc2hnAr21E9Hly8mZ1FO3zJWem0S4Q7kRgD9KEpu Lgkg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=g6wk8E63hRi1UpPz695juSTD3aMNhCCM/dwAKrSVuhg=; b=DTwsGbIdh5jRgP45TTgF0/aCbifKHJAT+wYGZ76Uu+W3dIWpNq8+nUXcwKLpl5nOWC j4hkILiydP53QESlHyF0B5b4FYbnAvlXcP+bwbtge0BK/X923tS9MJVTPzD3o0cwD2z1 8iG96/9NZbiCZPHjKGaScct9DGPY5eq2reeYkyUcneY59uzAc8ygzJzG0aMaIcs1jqsf 3D7FgLEplAd+mtz5a7ZGtIrjgZG+JVWL+l8Hzhwr9Ld0mOgkFSLfEDsa38uOXUahY2D0 WW4n4n0B6Jtt2lzCdAaEBR49vR3Efsj20z/Fv3V70SPxZaH8qGdwpdG9ULH0G3f8eNyi ychw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=LiUBaX8s; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id u25-v6si1551709qva.197.2018.08.08.21.28.20 for (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 08 Aug 2018 21:28:20 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=LiUBaX8s; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:47252 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fncYe-00068i-93 for patch@linaro.org; Thu, 09 Aug 2018 00:28:20 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54087) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fncT1-0000xw-KT for qemu-devel@nongnu.org; Thu, 09 Aug 2018 00:22:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fncSy-0007N7-BM for qemu-devel@nongnu.org; Thu, 09 Aug 2018 00:22:31 -0400 Received: from mail-pg1-x52f.google.com ([2607:f8b0:4864:20::52f]:41708) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fncSy-0007MW-28 for qemu-devel@nongnu.org; Thu, 09 Aug 2018 00:22:28 -0400 Received: by mail-pg1-x52f.google.com with SMTP id z8-v6so2107922pgu.8 for ; Wed, 08 Aug 2018 21:22:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=g6wk8E63hRi1UpPz695juSTD3aMNhCCM/dwAKrSVuhg=; b=LiUBaX8sarivR3S6lGs6Sw+1cTOa+LNHH6ROWn2C6nUiu6Y1gDJkNSkAc1nCQGvPzl /7NkCk7MwYrwKggw49pdo+KWsAM6SPJodo8WANgZEWQiizW4zHJZ3f1o+FIWlZ77DM02 3MfBEpxNRnVZwvCe6MLda8J3ZyXgHRHCW0U28= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=g6wk8E63hRi1UpPz695juSTD3aMNhCCM/dwAKrSVuhg=; b=CMxTzDpsg7wGmQ/WIzxdRY3vKoinfvyRgfPolwjg+VSG3zpPydbc9eDc6qGF8c9kTM bgiEvUq/SsQghFHq5WZ2wWnrThB9l3YNYNuCtTyTUSkTkHMyNYJQeFtqleD/fenKCi3D uS3QOo+YDg8rSXigeM9BOwXy32w+4tCwossXsX65Re3zbb+i5YmkLTFJmWYBgsMV61JV b+q4s6Z4qDwMmGAPqD3gAKrhA0yKQaXc1WiOKKn9sGPZzh3ly8u6A0BRdgWv/untjsbM lda7ltMQ1Eo3kq9COEylesbdKT/mqEVMbwQLwR5yHIbfXR0Sd6VkzQ98hr2fV+5Ejxr5 I8Jw== X-Gm-Message-State: AOUpUlHrxQgUMInO/uybAPydbXJj1eMa4boRdGiMNexg49pgKFHxSR3f mHJWwS7VRvxc+7JJ1Di5acks8GDwioE= X-Received: by 2002:a63:e14a:: with SMTP id h10-v6mr526936pgk.358.1533788546788; Wed, 08 Aug 2018 21:22:26 -0700 (PDT) Received: from cloudburst.twiddle.net (97-113-8-179.tukw.qwest.net. [97.113.8.179]) by smtp.gmail.com with ESMTPSA id m30-v6sm7355799pff.121.2018.08.08.21.22.25 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 08 Aug 2018 21:22:25 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Wed, 8 Aug 2018 21:21:59 -0700 Message-Id: <20180809042206.15726-14-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180809042206.15726-1-richard.henderson@linaro.org> References: <20180809042206.15726-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::52f Subject: [Qemu-devel] [PATCH 13/20] target/arm: Rewrite helper_sve_ld[234]*_r X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: laurent.desnogues@gmail.com, peter.maydell@linaro.org, alex.bennee@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Use the same *_tlb primitives as we use for ld1. This is not a significant change, but does (for linux-user) hoist the set of helper_retaddr, and (for softmmu) hoist the computation of the current mmu_idx outside the loop. This does fix the endianness problem for softmmu, and does move the main loop out of a macro and into an inlined function. Signed-off-by: Richard Henderson --- target/arm/sve_helper.c | 210 ++++++++++++++++++++++------------------ 1 file changed, 117 insertions(+), 93 deletions(-) -- 2.17.1 Reviewed-by: Peter Maydell diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 4ca9412e20..5cc7de5077 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -4398,109 +4398,133 @@ DO_LD1_2(ld1dd, 3, 3) #undef DO_LD1_1 #undef DO_LD1_2 -#define DO_LD2(NAME, FN, TYPEE, TYPEM, H) \ -void HELPER(NAME)(CPUARMState *env, void *vg, \ - target_ulong addr, uint32_t desc) \ -{ \ - intptr_t i, oprsz = simd_oprsz(desc); \ - intptr_t ra = GETPC(); \ - unsigned rd = simd_data(desc); \ - void *d1 = &env->vfp.zregs[rd]; \ - void *d2 = &env->vfp.zregs[(rd + 1) & 31]; \ - for (i = 0; i < oprsz; ) { \ - uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ - do { \ - TYPEM m1 = 0, m2 = 0; \ - if (pg & 1) { \ - m1 = FN(env, addr, ra); \ - m2 = FN(env, addr + sizeof(TYPEM), ra); \ - } \ - *(TYPEE *)(d1 + H(i)) = m1; \ - *(TYPEE *)(d2 + H(i)) = m2; \ - i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \ - addr += 2 * sizeof(TYPEM); \ - } while (i & 15); \ - } \ +/* + * Common helpers for all contiguous 2,3,4-register predicated loads. + */ +static void sve_ld2_r(CPUARMState *env, void *vg, target_ulong addr, + uint32_t desc, int size, uintptr_t ra, + sve_ld1_tlb_fn *tlb_fn) +{ + const int mmu_idx = cpu_mmu_index(env, false); + intptr_t i, oprsz = simd_oprsz(desc); + unsigned rd = simd_data(desc); + ARMVectorReg scratch[2] = { }; + + set_helper_retaddr(ra); + for (i = 0; i < oprsz; ) { + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); + do { + if (pg & 1) { + tlb_fn(env, &scratch[0], i, addr, mmu_idx, ra); + tlb_fn(env, &scratch[1], i, addr + size, mmu_idx, ra); + } + i += size, pg >>= size; + addr += 2 * size; + } while (i & 15); + } + set_helper_retaddr(0); + + /* Wait until all exceptions have been raised to write back. */ + memcpy(&env->vfp.zregs[rd], &scratch[0], oprsz); + memcpy(&env->vfp.zregs[(rd + 1) & 31], &scratch[1], oprsz); } -#define DO_LD3(NAME, FN, TYPEE, TYPEM, H) \ -void HELPER(NAME)(CPUARMState *env, void *vg, \ - target_ulong addr, uint32_t desc) \ -{ \ - intptr_t i, oprsz = simd_oprsz(desc); \ - intptr_t ra = GETPC(); \ - unsigned rd = simd_data(desc); \ - void *d1 = &env->vfp.zregs[rd]; \ - void *d2 = &env->vfp.zregs[(rd + 1) & 31]; \ - void *d3 = &env->vfp.zregs[(rd + 2) & 31]; \ - for (i = 0; i < oprsz; ) { \ - uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ - do { \ - TYPEM m1 = 0, m2 = 0, m3 = 0; \ - if (pg & 1) { \ - m1 = FN(env, addr, ra); \ - m2 = FN(env, addr + sizeof(TYPEM), ra); \ - m3 = FN(env, addr + 2 * sizeof(TYPEM), ra); \ - } \ - *(TYPEE *)(d1 + H(i)) = m1; \ - *(TYPEE *)(d2 + H(i)) = m2; \ - *(TYPEE *)(d3 + H(i)) = m3; \ - i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \ - addr += 3 * sizeof(TYPEM); \ - } while (i & 15); \ - } \ +static void sve_ld3_r(CPUARMState *env, void *vg, target_ulong addr, + uint32_t desc, int size, uintptr_t ra, + sve_ld1_tlb_fn *tlb_fn) +{ + const int mmu_idx = cpu_mmu_index(env, false); + intptr_t i, oprsz = simd_oprsz(desc); + unsigned rd = simd_data(desc); + ARMVectorReg scratch[3] = { }; + + set_helper_retaddr(ra); + for (i = 0; i < oprsz; ) { + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); + do { + if (pg & 1) { + tlb_fn(env, &scratch[0], i, addr, mmu_idx, ra); + tlb_fn(env, &scratch[1], i, addr + size, mmu_idx, ra); + tlb_fn(env, &scratch[2], i, addr + 2 * size, mmu_idx, ra); + } + i += size, pg >>= size; + addr += 3 * size; + } while (i & 15); + } + set_helper_retaddr(0); + + /* Wait until all exceptions have been raised to write back. */ + memcpy(&env->vfp.zregs[rd], &scratch[0], oprsz); + memcpy(&env->vfp.zregs[(rd + 1) & 31], &scratch[1], oprsz); + memcpy(&env->vfp.zregs[(rd + 2) & 31], &scratch[2], oprsz); } -#define DO_LD4(NAME, FN, TYPEE, TYPEM, H) \ -void HELPER(NAME)(CPUARMState *env, void *vg, \ - target_ulong addr, uint32_t desc) \ -{ \ - intptr_t i, oprsz = simd_oprsz(desc); \ - intptr_t ra = GETPC(); \ - unsigned rd = simd_data(desc); \ - void *d1 = &env->vfp.zregs[rd]; \ - void *d2 = &env->vfp.zregs[(rd + 1) & 31]; \ - void *d3 = &env->vfp.zregs[(rd + 2) & 31]; \ - void *d4 = &env->vfp.zregs[(rd + 3) & 31]; \ - for (i = 0; i < oprsz; ) { \ - uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ - do { \ - TYPEM m1 = 0, m2 = 0, m3 = 0, m4 = 0; \ - if (pg & 1) { \ - m1 = FN(env, addr, ra); \ - m2 = FN(env, addr + sizeof(TYPEM), ra); \ - m3 = FN(env, addr + 2 * sizeof(TYPEM), ra); \ - m4 = FN(env, addr + 3 * sizeof(TYPEM), ra); \ - } \ - *(TYPEE *)(d1 + H(i)) = m1; \ - *(TYPEE *)(d2 + H(i)) = m2; \ - *(TYPEE *)(d3 + H(i)) = m3; \ - *(TYPEE *)(d4 + H(i)) = m4; \ - i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \ - addr += 4 * sizeof(TYPEM); \ - } while (i & 15); \ - } \ +static void sve_ld4_r(CPUARMState *env, void *vg, target_ulong addr, + uint32_t desc, int size, uintptr_t ra, + sve_ld1_tlb_fn *tlb_fn) +{ + const int mmu_idx = cpu_mmu_index(env, false); + intptr_t i, oprsz = simd_oprsz(desc); + unsigned rd = simd_data(desc); + ARMVectorReg scratch[4] = { }; + + set_helper_retaddr(ra); + for (i = 0; i < oprsz; ) { + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); + do { + if (pg & 1) { + tlb_fn(env, &scratch[0], i, addr, mmu_idx, ra); + tlb_fn(env, &scratch[1], i, addr + size, mmu_idx, ra); + tlb_fn(env, &scratch[2], i, addr + 2 * size, mmu_idx, ra); + tlb_fn(env, &scratch[3], i, addr + 3 * size, mmu_idx, ra); + } + i += size, pg >>= size; + addr += 4 * size; + } while (i & 15); + } + set_helper_retaddr(0); + + /* Wait until all exceptions have been raised to write back. */ + memcpy(&env->vfp.zregs[rd], &scratch[0], oprsz); + memcpy(&env->vfp.zregs[(rd + 1) & 31], &scratch[1], oprsz); + memcpy(&env->vfp.zregs[(rd + 2) & 31], &scratch[2], oprsz); + memcpy(&env->vfp.zregs[(rd + 3) & 31], &scratch[3], oprsz); } -DO_LD2(sve_ld2bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1) -DO_LD3(sve_ld3bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1) -DO_LD4(sve_ld4bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1) +#define DO_LDN_1(N) \ +void __attribute__((flatten)) HELPER(sve_ld##N##bb_r) \ + (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc) \ +{ \ + sve_ld##N##_r(env, vg, addr, desc, 1, GETPC(), sve_ld1bb_tlb); \ +} -DO_LD2(sve_ld2hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2) -DO_LD3(sve_ld3hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2) -DO_LD4(sve_ld4hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2) +#define DO_LDN_2(N, SUFF, SIZE) \ +void __attribute__((flatten)) HELPER(sve_ld##N##SUFF##_r) \ + (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc) \ +{ \ + sve_ld##N##_r(env, vg, addr, desc, SIZE, GETPC(), \ + arm_cpu_data_is_big_endian(env) \ + ? sve_ld1##SUFF##_be_tlb : sve_ld1##SUFF##_le_tlb); \ +} -DO_LD2(sve_ld2ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4) -DO_LD3(sve_ld3ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4) -DO_LD4(sve_ld4ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4) +DO_LDN_1(2) +DO_LDN_1(3) +DO_LDN_1(4) -DO_LD2(sve_ld2dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, ) -DO_LD3(sve_ld3dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, ) -DO_LD4(sve_ld4dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, ) +DO_LDN_2(2, hh, 2) +DO_LDN_2(3, hh, 2) +DO_LDN_2(4, hh, 2) -#undef DO_LD2 -#undef DO_LD3 -#undef DO_LD4 +DO_LDN_2(2, ss, 4) +DO_LDN_2(3, ss, 4) +DO_LDN_2(4, ss, 4) + +DO_LDN_2(2, dd, 8) +DO_LDN_2(3, dd, 8) +DO_LDN_2(4, dd, 8) + +#undef DO_LDN_1 +#undef DO_LDN_2 /* * Load contiguous data, first-fault and no-fault.