From patchwork Tue Oct 10 00:55:46 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 115316 Delivered-To: patch@linaro.org Received: by 10.140.22.163 with SMTP id 32csp3169621qgn; Mon, 9 Oct 2017 18:02:10 -0700 (PDT) X-Received: by 10.200.51.39 with SMTP id t36mr7588967qta.337.1507597330287; Mon, 09 Oct 2017 18:02:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1507597330; cv=none; d=google.com; s=arc-20160816; b=k5iiLd4sHw5XBeAtbfbKG+ny3LYNmDiRj1kDuwQiS+JKcV0OqASpwAoGS1jpUeIFSN yrBKUf5MajEppdvjJvmu3rmdOu9eYe1EMJ5lkkKrhfs25yVk8JtavbE53sHNYdI5WF8t rjSf4qoHKaBCIJGRPr06sjFFHXPh9a4gULE/v3Re9EBX6ccnAp5qmAog/mcEtvTXkiSv 6B8W9gMMBf4IpiSXIIpLUMkn+xpKlqXKG8BfV/xKdXV+vSFu84hChNwnsob/5NzTPSof uIrpFrv4OSCCRP4x43sA0sZNLzemeJcflfAN78R4bM1oQVpl7UzA0bHZU7Z5sTT2bIwn V3kA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=w/K6G1uL/Y5M6sFMk/lh/F5fVU0kU85wSm3FWNOsRgc=; b=yqVi+ZBiU3IX4UrXcTVpNbOFVUuaJw6ikVnbpy1OEWXqqIJouDjkrUGQ2PkYIFXWVa eM5zyWiFTNecNSIJTL5MGeRl73atwdazxv9ENwgUgcrpVappqPEFgL4pPJiPYzh03FKS sFPiQGAW+PkO4QjqAR265f72OAO4F0UsgQ1KV+JT2+gxtI87aHqXgalqUMueNTI66/kT 8mPPfWHP+mcOmelqDCXwBo0DamEckKgw+WY/IdHQW/vVWu/XPwMbLc3+TFrZLrnRurcJ 2UBHC/VLA+Q/ckE09PjeY3W1RAaaP3XrW5f1a9W2X6opkLHyPJmnENjam+MosWHA9Eyj ee3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=igA0YPUG; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id d18si7250559qtl.395.2017.10.09.18.02.09 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 09 Oct 2017 18:02:10 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=igA0YPUG; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:60509 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e1ivw-0006ha-2b for patch@linaro.org; Mon, 09 Oct 2017 21:02:08 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60486) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e1iqI-0002IT-8i for qemu-devel@nongnu.org; Mon, 09 Oct 2017 20:56:20 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1e1iqG-0004RE-MV for qemu-devel@nongnu.org; Mon, 09 Oct 2017 20:56:18 -0400 Received: from mail-pf0-x235.google.com ([2607:f8b0:400e:c00::235]:48099) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1e1iqG-0004Qs-DQ for qemu-devel@nongnu.org; Mon, 09 Oct 2017 20:56:16 -0400 Received: by mail-pf0-x235.google.com with SMTP id z11so5533039pfk.4 for ; Mon, 09 Oct 2017 17:56:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=w/K6G1uL/Y5M6sFMk/lh/F5fVU0kU85wSm3FWNOsRgc=; b=igA0YPUGGx+3OWTQGmN8xGpgIUrr0xwASp4BCMoD/J4olirhDZc90bgbU17KLwL384 nEVTCd/yE7L8sAq6sj1a0WbHTQ5O4Xpj4lr4ACvxjpRB6e4NV7Fs9oL2qXEU1DKQKwJ/ OMuLQvh4HOwN3+NS/OODn3ccFgsEm0129XK0w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=w/K6G1uL/Y5M6sFMk/lh/F5fVU0kU85wSm3FWNOsRgc=; b=mAE0TqGqF/z5c6I6O5XNgYR7tya9KIqBJ1ZnrwqU3tOBL1+PIlYzaGUzbAYUO+3A5g RTj7RiDIIE83JWZryOT+/5MEOLT7UErXMHOHFH0unf6Q1LkrgijpWHeK2e9Ciq4k70L5 ChUgY1ZJa84htWiTdfK9eC7+HZ5K0ug6Mr4giWp3hLDd7X7o0TtXfXvTgCNSqtlJxspB 5KRJMQ9xIq20MdW0zEdQVPsccjDSio3qb64Lpu7cVjt4MllR+MZgE7Tl+WKQDL6+S8i8 sgiBgk2+X3FS/OIGVZXJZqYFzdQW4uhi4yq1G78riw/It+0kKGXpiLinOp3w79+4wAAH 8uiw== X-Gm-Message-State: AMCzsaVC0JqkJwLSsEWxnRuyjSib2L6qUcGUN5yVt5VL6RhMqu4g63+j DSgYWhVJdlXAM22hlRERJYnK3ugsoFQ= X-Google-Smtp-Source: AOwi7QDtPSs6Gg2EyTRyojjcRZKRbFNNloZTywSpyRWchmMDz5z0HYyWCoskcgxzH0muuG7u6/HNDg== X-Received: by 10.99.95.207 with SMTP id t198mr10657226pgb.145.1507596974962; Mon, 09 Oct 2017 17:56:14 -0700 (PDT) Received: from bigtime.twiddle.net (97-126-104-76.tukw.qwest.net. [97.126.104.76]) by smtp.gmail.com with ESMTPSA id n19sm17121368pfj.52.2017.10.09.17.56.13 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 09 Oct 2017 17:56:14 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Mon, 9 Oct 2017 17:55:46 -0700 Message-Id: <20171010005600.28735-10-richard.henderson@linaro.org> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20171010005600.28735-1-richard.henderson@linaro.org> References: <20171010005600.28735-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::235 Subject: [Qemu-devel] [PULL 09/23] tcg: consolidate TB lookups in tb_lookup__cpu_state X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, "Emilio G. Cota" Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" From: "Emilio G. Cota" This avoids duplicating code. cpu_exec_step will also use the new common function once we integrate parallel_cpus into tb->cflags. Note that in this commit we also fix a race, described by Richard Henderson during review. Think of this scenario with threads A and B: (A) Lookup succeeds for TB in hash without tb_lock (B) Sets the TB's tb->invalid flag (B) Removes the TB from tb_htable (B) Clears all CPU's tb_jmp_cache (A) Store TB into local tb_jmp_cache Given that order of events, (A) will keep executing that invalid TB until another flush of its tb_jmp_cache happens, which in theory might never happen. We can fix this by checking the tb->invalid flag every time we look up a TB from tb_jmp_cache, so that in the above scenario, next time we try to find that TB in tb_jmp_cache, we won't, and will therefore be forced to look it up in tb_htable. Performance-wise, I measured a small improvement when booting debian-arm. Note that inlining pays off: Performance counter stats for 'taskset -c 0 qemu-system-arm \ -machine type=virt -nographic -smp 1 -m 4096 \ -netdev user,id=unet,hostfwd=tcp::2222-:22 \ -device virtio-net-device,netdev=unet \ -drive file=jessie.qcow2,id=myblock,index=0,if=none \ -device virtio-blk-device,drive=myblock \ -kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \ -name arm,debug-threads=on -smp 1' (10 runs): Before: 18714.917392 task-clock # 0.952 CPUs utilized ( +- 0.95% ) 23,142 context-switches # 0.001 M/sec ( +- 0.50% ) 1 CPU-migrations # 0.000 M/sec 10,558 page-faults # 0.001 M/sec ( +- 0.95% ) 53,957,727,252 cycles # 2.883 GHz ( +- 0.91% ) [83.33%] 24,440,599,852 stalled-cycles-frontend # 45.30% frontend cycles idle ( +- 1.20% ) [83.33%] 16,495,714,424 stalled-cycles-backend # 30.57% backend cycles idle ( +- 0.95% ) [66.66%] 76,267,572,582 instructions # 1.41 insns per cycle # 0.32 stalled cycles per insn ( +- 0.87% ) [83.34%] 12,692,186,323 branches # 678.186 M/sec ( +- 0.92% ) [83.35%] 263,486,879 branch-misses # 2.08% of all branches ( +- 0.73% ) [83.34%] 19.648474449 seconds time elapsed ( +- 0.82% ) After, w/ inline (this patch): 18471.376627 task-clock # 0.955 CPUs utilized ( +- 0.96% ) 23,048 context-switches # 0.001 M/sec ( +- 0.48% ) 1 CPU-migrations # 0.000 M/sec 10,708 page-faults # 0.001 M/sec ( +- 0.81% ) 53,208,990,796 cycles # 2.881 GHz ( +- 0.98% ) [83.34%] 23,941,071,673 stalled-cycles-frontend # 44.99% frontend cycles idle ( +- 0.95% ) [83.34%] 16,161,773,848 stalled-cycles-backend # 30.37% backend cycles idle ( +- 0.76% ) [66.67%] 75,786,269,766 instructions # 1.42 insns per cycle # 0.32 stalled cycles per insn ( +- 1.24% ) [83.34%] 12,573,617,143 branches # 680.708 M/sec ( +- 1.34% ) [83.33%] 260,235,550 branch-misses # 2.07% of all branches ( +- 0.66% ) [83.33%] 19.340502161 seconds time elapsed ( +- 0.56% ) After, w/o inline: 18791.253967 task-clock # 0.954 CPUs utilized ( +- 0.78% ) 23,230 context-switches # 0.001 M/sec ( +- 0.42% ) 1 CPU-migrations # 0.000 M/sec 10,563 page-faults # 0.001 M/sec ( +- 1.27% ) 54,168,674,622 cycles # 2.883 GHz ( +- 0.80% ) [83.34%] 24,244,712,629 stalled-cycles-frontend # 44.76% frontend cycles idle ( +- 1.37% ) [83.33%] 16,288,648,572 stalled-cycles-backend # 30.07% backend cycles idle ( +- 0.95% ) [66.66%] 77,659,755,503 instructions # 1.43 insns per cycle # 0.31 stalled cycles per insn ( +- 0.97% ) [83.34%] 12,922,780,045 branches # 687.702 M/sec ( +- 1.06% ) [83.34%] 261,962,386 branch-misses # 2.03% of all branches ( +- 0.71% ) [83.35%] 19.700174670 seconds time elapsed ( +- 0.56% ) Reviewed-by: Richard Henderson Signed-off-by: Emilio G. Cota Signed-off-by: Richard Henderson --- include/exec/tb-lookup.h | 49 ++++++++++++++++++++++++++++++++++++++++++++++++ accel/tcg/cpu-exec.c | 47 ++++++++++++++++++---------------------------- accel/tcg/tcg-runtime.c | 24 ++++++------------------ 3 files changed, 73 insertions(+), 47 deletions(-) create mode 100644 include/exec/tb-lookup.h -- 2.13.6 diff --git a/include/exec/tb-lookup.h b/include/exec/tb-lookup.h new file mode 100644 index 0000000000..9d32cb0c6e --- /dev/null +++ b/include/exec/tb-lookup.h @@ -0,0 +1,49 @@ +/* + * Copyright (C) 2017, Emilio G. Cota + * + * License: GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ +#ifndef EXEC_TB_LOOKUP_H +#define EXEC_TB_LOOKUP_H + +#include "qemu/osdep.h" + +#ifdef NEED_CPU_H +#include "cpu.h" +#else +#include "exec/poison.h" +#endif + +#include "exec/exec-all.h" +#include "exec/tb-hash.h" + +/* Might cause an exception, so have a longjmp destination ready */ +static inline TranslationBlock * +tb_lookup__cpu_state(CPUState *cpu, target_ulong *pc, target_ulong *cs_base, + uint32_t *flags) +{ + CPUArchState *env = (CPUArchState *)cpu->env_ptr; + TranslationBlock *tb; + uint32_t hash; + + cpu_get_tb_cpu_state(env, pc, cs_base, flags); + hash = tb_jmp_cache_hash_func(*pc); + tb = atomic_rcu_read(&cpu->tb_jmp_cache[hash]); + if (likely(tb && + tb->pc == *pc && + tb->cs_base == *cs_base && + tb->flags == *flags && + tb->trace_vcpu_dstate == *cpu->trace_dstate && + !atomic_read(&tb->invalid))) { + return tb; + } + tb = tb_htable_lookup(cpu, *pc, *cs_base, *flags); + if (tb == NULL) { + return NULL; + } + atomic_set(&cpu->tb_jmp_cache[hash], tb); + return tb; +} + +#endif /* EXEC_TB_LOOKUP_H */ diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c index 32104b8d8c..f8a1d68db7 100644 --- a/accel/tcg/cpu-exec.c +++ b/accel/tcg/cpu-exec.c @@ -28,6 +28,7 @@ #include "exec/address-spaces.h" #include "qemu/rcu.h" #include "exec/tb-hash.h" +#include "exec/tb-lookup.h" #include "exec/log.h" #include "qemu/main-loop.h" #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY) @@ -368,43 +369,31 @@ static inline TranslationBlock *tb_find(CPUState *cpu, TranslationBlock *last_tb, int tb_exit) { - CPUArchState *env = (CPUArchState *)cpu->env_ptr; TranslationBlock *tb; target_ulong cs_base, pc; uint32_t flags; bool acquired_tb_lock = false; - /* we record a subset of the CPU state. It will - always be the same before a given translated block - is executed. */ - cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); - tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]); - if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base || - tb->flags != flags || - tb->trace_vcpu_dstate != *cpu->trace_dstate)) { - tb = tb_htable_lookup(cpu, pc, cs_base, flags); - if (!tb) { - - /* mmap_lock is needed by tb_gen_code, and mmap_lock must be - * taken outside tb_lock. As system emulation is currently - * single threaded the locks are NOPs. - */ - mmap_lock(); - tb_lock(); - acquired_tb_lock = true; - - /* There's a chance that our desired tb has been translated while - * taking the locks so we check again inside the lock. - */ - tb = tb_htable_lookup(cpu, pc, cs_base, flags); - if (!tb) { - /* if no translated code available, then translate it now */ - tb = tb_gen_code(cpu, pc, cs_base, flags, 0); - } + tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags); + if (tb == NULL) { + /* mmap_lock is needed by tb_gen_code, and mmap_lock must be + * taken outside tb_lock. As system emulation is currently + * single threaded the locks are NOPs. + */ + mmap_lock(); + tb_lock(); + acquired_tb_lock = true; - mmap_unlock(); + /* There's a chance that our desired tb has been translated while + * taking the locks so we check again inside the lock. + */ + tb = tb_htable_lookup(cpu, pc, cs_base, flags); + if (likely(tb == NULL)) { + /* if no translated code available, then translate it now */ + tb = tb_gen_code(cpu, pc, cs_base, flags, 0); } + mmap_unlock(); /* We add the TB in the virtual pc hash table for the fast lookup */ atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb); } diff --git a/accel/tcg/tcg-runtime.c b/accel/tcg/tcg-runtime.c index b75394aba8..d0edd944b0 100644 --- a/accel/tcg/tcg-runtime.c +++ b/accel/tcg/tcg-runtime.c @@ -27,7 +27,7 @@ #include "exec/helper-proto.h" #include "exec/cpu_ldst.h" #include "exec/exec-all.h" -#include "exec/tb-hash.h" +#include "exec/tb-lookup.h" #include "disas/disas.h" #include "exec/log.h" @@ -149,24 +149,12 @@ void *HELPER(lookup_tb_ptr)(CPUArchState *env) CPUState *cpu = ENV_GET_CPU(env); TranslationBlock *tb; target_ulong cs_base, pc; - uint32_t flags, hash; - - cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); - hash = tb_jmp_cache_hash_func(pc); - tb = atomic_rcu_read(&cpu->tb_jmp_cache[hash]); - - if (unlikely(!(tb - && tb->pc == pc - && tb->cs_base == cs_base - && tb->flags == flags - && tb->trace_vcpu_dstate == *cpu->trace_dstate))) { - tb = tb_htable_lookup(cpu, pc, cs_base, flags); - if (!tb) { - return tcg_ctx.code_gen_epilogue; - } - atomic_set(&cpu->tb_jmp_cache[hash], tb); - } + uint32_t flags; + tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags); + if (tb == NULL) { + return tcg_ctx.code_gen_epilogue; + } qemu_log_mask_and_addr(CPU_LOG_EXEC, pc, "Chain %p [%d: " TARGET_FMT_lx "] %s\n", tb->tc_ptr, cpu->cpu_index, pc,