From patchwork Wed Oct 23 11:33:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Alex_Benn=C3=A9e?= X-Patchwork-Id: 837804 Delivered-To: patch@linaro.org Received: by 2002:a05:6000:1f12:b0:37d:45d0:187 with SMTP id hh18csp405507wrb; Wed, 23 Oct 2024 04:35:05 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCU2ajofgKTmQFTIkhOyv8OSO7jySczprnob02ehB1vWPwN0PWMV7mcht73LkmQKKBFA2o9yvQ==@linaro.org X-Google-Smtp-Source: AGHT+IFbFPuApzYdzWb/v810rhslFqqtzazwjs+0UoPpnJ68myawbx/YC37XlLx8Q7+Qz2G4yn6c X-Received: by 2002:a05:622a:13d4:b0:458:23fc:f345 with SMTP id d75a77b69052e-46114727a0cmr31288891cf.38.1729683304809; Wed, 23 Oct 2024 04:35:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1729683304; cv=none; d=google.com; s=arc-20240605; b=Fc4JaCZqQGl7p0lnAoDCz/9l0VVaVUrBASqXz4fbDTWVN5g8y355zSz5UDv+8TJDjD zOhb7Rl6UAC7cet45e2FBFuXxEaYPdLYn7BEB+GAS73JU4bSRev02VS+C3Dr2XP//WiP iSE63ZdE8xNDgHzYWB+FjldTchN9NxOQdg3aRlWqCRXoBcRUHGNjubyBH0lgG3tmPqun E7G31PtRrtihP4wqa0VbLzSLvSijtlm4j7jidaqej4fA7xAHnUueexgEmk12qWVz69PA Yvm2UpL5dVE7Layjr+Y5ROlhKoe6F9KVoWXOQf1aK4aNOy5ygXoyiwbtUeJegldBl0fO RYug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=CnSO5MY5LW+ONfwtsvf8QoavFQHBZ1D8g95DfEWZbMI=; fh=jDher3lN9MWBdjSkb8j3t3602VJXe3Qz1zNBmICawX8=; b=WJrOICN+/8V/AFmQmNytp1/xVJwXNPA8TPq03yCtSrCABzreqCSGPaLtB8WlZJlvPw d01usUjSSpexoi5WRJLlGSyrcoSr63zFIIoDDT0PlzAjAEgUxcjPQV9tziYKqk9gM9PF B2YaW60PWDQEvEuHz342qyEp+B5vc9ZPa85Wtv4M9QlbWssqAqguXrJfQROXHv8Bfg0c 7gv+SskIwALP0y2rx5fFvwseOg/j2ocvH4kT2/TJd9r5R/y3yEvEW2mb3/pyRT4i9qwk ArSHFe6WvzWDW5N8VBD78QZG5bhyXvMCUBW6Vv/NydqmswODQDXl3CEfyO7Gv02RldTE ETiA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=jQU2mK7z; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org; dara=neutral header.i=@linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id d75a77b69052e-460d3d49667si85639561cf.582.2024.10.23.04.35.04 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Wed, 23 Oct 2024 04:35:04 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=jQU2mK7z; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org; dara=neutral header.i=@linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1t3ZdB-0007KK-L5; Wed, 23 Oct 2024 07:34:25 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1t3Zd9-0007HK-3e for qemu-devel@nongnu.org; Wed, 23 Oct 2024 07:34:23 -0400 Received: from mail-ej1-x635.google.com ([2a00:1450:4864:20::635]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1t3Zd4-00037Q-B3 for qemu-devel@nongnu.org; Wed, 23 Oct 2024 07:34:22 -0400 Received: by mail-ej1-x635.google.com with SMTP id a640c23a62f3a-a9a3dc089d8so928281466b.3 for ; Wed, 23 Oct 2024 04:34:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1729683256; x=1730288056; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CnSO5MY5LW+ONfwtsvf8QoavFQHBZ1D8g95DfEWZbMI=; b=jQU2mK7z3WhJqfx/UsSvHzqcfWIupt+nlC+hTg76NZDXNSY2tj5kIkvGzJlnQt7hHr mZvwGbwW1j8oeElIyPTYmDemcbhHZnajALtEoAF7ugxNS7WPOpIa+B4U8++x+annx4DF 5H+o0FuffaaCOY9QCCwJVG5zFkfvYN8858BS6rllI2NXcW8g6btRZq9rWHgVc/49tHmB Ewchq+87KaiSxQqrMbGqmdjtRkdKiK+wKEC9o1IwzkhYG0ibwt2hmutES50jkfw2m1ST fzXur1aPC51ZbnX25YqQsc5dXRSdNrzYyWB+fvF4ZnSMdGW0tDaMwIyQPZ50OInHTnLK d5XA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729683256; x=1730288056; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CnSO5MY5LW+ONfwtsvf8QoavFQHBZ1D8g95DfEWZbMI=; b=eA4oDVQVyurFI4zsslXstbWYo+0J3Odpd556UcunOZs8x3T0xY6/0R0fGi4bEYapOr 8iVlpwF8zSp1eLbIN1RysM7q3YzDEGczG8/tPB3Rbb4y0z/xDeeZziroyJEayGlLOidU FbAyscmgBuwfyLdjCP4K8QNqO/GPOCArTe7GgeBnx8y05Ow5TsC0rxpsNFLtiA7CoREU m+Ny/FGVAuYJr8YpgclnaBmMamjsKaD+eNWvZdU2w9rxWdXveQja2ul8tFpHG6Cryvwr omdkbcU/XXMaqTfY4nsX8kq5kE8/E/AN8kGH2qCNqeboe9dPMDLJoRFLde4UHIoMQ2CI lc7g== X-Gm-Message-State: AOJu0YzguRGzPuT+4RSB/pF0c+LaYUOnB2GjhdOP6yl3Q+aTHOulcs07 g8qzsJIIZyaPQ4EPEQF8a1BVMgX/8iXvJq4SF/bwF1TGQtLvIps6ARAgHzSl4hs= X-Received: by 2002:a17:907:9496:b0:a99:46dd:f397 with SMTP id a640c23a62f3a-a9abf96f1f5mr184448566b.64.1729683255674; Wed, 23 Oct 2024 04:34:15 -0700 (PDT) Received: from draig.lan ([85.9.250.243]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9a912ed80fsm463139066b.46.2024.10.23.04.34.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Oct 2024 04:34:10 -0700 (PDT) Received: from draig.lan (localhost [IPv6:::1]) by draig.lan (Postfix) with ESMTP id B61D85F942; Wed, 23 Oct 2024 12:34:07 +0100 (BST) From: =?utf-8?q?Alex_Benn=C3=A9e?= To: qemu-devel@nongnu.org Cc: Zhao Liu , Eduardo Habkost , Pierrick Bouvier , Peter Maydell , Yanan Wang , John Snow , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , devel@lists.libvirt.org, Marcelo Tosatti , Riku Voipio , Wainer dos Santos Moschetta , Marcel Apfelbaum , Paolo Bonzini , Mahmoud Mandour , Laurent Vivier , Jiaxun Yang , "Edgar E. Iglesias" , Richard Henderson , Alexandre Iooss , qemu-arm@nongnu.org, =?utf-8?q?Daniel_?= =?utf-8?q?P=2E_Berrang=C3=A9?= , =?utf-8?q?Marc-Andr?= =?utf-8?q?=C3=A9_Lureau?= , Cleber Rosa , kvm@vger.kernel.org, Beraldo Leal , Thomas Huth , =?utf-8?q?Alex_Benn=C3=A9e?= Subject: [PATCH v3 08/18] accel/tcg: add tracepoints for cpu_loop_exit_atomic Date: Wed, 23 Oct 2024 12:33:56 +0100 Message-Id: <20241023113406.1284676-9-alex.bennee@linaro.org> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20241023113406.1284676-1-alex.bennee@linaro.org> References: <20241023113406.1284676-1-alex.bennee@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2a00:1450:4864:20::635; envelope-from=alex.bennee@linaro.org; helo=mail-ej1-x635.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org We try to avoid using cpu_loop_exit_atomic as it brings in an all-core sync point. However on some cpu/kernel/benchmark combinations it is starting to show up in the performance profile. To make it easier to see whats going on add tracepoints for the slow path so we can see what is triggering the wait. It seems for a modern CPU it can be quite a bit, for example: ./qemu-system-aarch64 \ -machine type=virt,virtualization=on,pflash0=rom,pflash1=efivars,gic-version=max \ -smp 4 \ -accel tcg \ -device virtio-net-pci,netdev=unet \ -device virtio-scsi-pci \ -device scsi-hd,drive=hd \ -netdev user,id=unet,hostfwd=tcp::2222-:22 \ -blockdev driver=raw,node-name=hd,file.driver=host_device,file.filename=/dev/zen-ssd2/trixie-arm64,discard=unmap \ -serial mon:stdio \ -blockdev node-name=rom,driver=file,filename=(pwd)/pc-bios/edk2-aarch64-code.fd,read-only=true \ -blockdev node-name=efivars,driver=file,filename=$HOME/images/qemu-arm64-efivars \ -m 8192 \ -object memory-backend-memfd,id=mem,size=8G,share=on \ -kernel /home/alex/lsrc/linux.git/builds/arm64/arch/arm64/boot/Image -append "root=/dev/sda2 console=ttyAMA0 systemd.unit=benchmark-stress-ng.service" \ -display none -d trace:load_atom\*_fallback,trace:store_atom\*_fallback With: -cpu neoverse-v1,pauth-impdef=on => 2203343 With: -cpu cortex-a76 => 0 Signed-off-by: Alex Bennée Reviewed-by: Richard Henderson Reviewed-by: Pierrick Bouvier --- v2 - 0x prefixes for ra as per checkpatch --- accel/tcg/user-exec.c | 2 +- accel/tcg/ldst_atomicity.c.inc | 9 +++++++++ accel/tcg/trace-events | 12 ++++++++++++ 3 files changed, 22 insertions(+), 1 deletion(-) diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c index 51b2c16dbe..aa8af52cc3 100644 --- a/accel/tcg/user-exec.c +++ b/accel/tcg/user-exec.c @@ -29,7 +29,7 @@ #include "exec/page-protection.h" #include "exec/helper-proto.h" #include "qemu/atomic128.h" -#include "trace/trace-root.h" +#include "trace.h" #include "tcg/tcg-ldst.h" #include "internal-common.h" #include "internal-target.h" diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc index 134da3c1da..c735add261 100644 --- a/accel/tcg/ldst_atomicity.c.inc +++ b/accel/tcg/ldst_atomicity.c.inc @@ -168,6 +168,7 @@ static uint64_t load_atomic8_or_exit(CPUState *cpu, uintptr_t ra, void *pv) #endif /* Ultimate fallback: re-execute in serial context. */ + trace_load_atom8_or_exit_fallback(ra); cpu_loop_exit_atomic(cpu, ra); } @@ -212,6 +213,7 @@ static Int128 load_atomic16_or_exit(CPUState *cpu, uintptr_t ra, void *pv) } /* Ultimate fallback: re-execute in serial context. */ + trace_load_atom16_or_exit_fallback(ra); cpu_loop_exit_atomic(cpu, ra); } @@ -519,6 +521,7 @@ static uint64_t load_atom_8(CPUState *cpu, uintptr_t ra, if (HAVE_al8) { return load_atom_extract_al8x2(pv); } + trace_load_atom8_fallback(memop, ra); cpu_loop_exit_atomic(cpu, ra); default: g_assert_not_reached(); @@ -563,6 +566,7 @@ static Int128 load_atom_16(CPUState *cpu, uintptr_t ra, break; case MO_64: if (!HAVE_al8) { + trace_load_atom16_fallback(memop, ra); cpu_loop_exit_atomic(cpu, ra); } a = load_atomic8(pv); @@ -570,6 +574,7 @@ static Int128 load_atom_16(CPUState *cpu, uintptr_t ra, break; case -MO_64: if (!HAVE_al8) { + trace_load_atom16_fallback(memop, ra); cpu_loop_exit_atomic(cpu, ra); } a = load_atom_extract_al8x2(pv); @@ -897,6 +902,7 @@ static void store_atom_2(CPUState *cpu, uintptr_t ra, g_assert_not_reached(); } + trace_store_atom2_fallback(memop, ra); cpu_loop_exit_atomic(cpu, ra); } @@ -961,6 +967,7 @@ static void store_atom_4(CPUState *cpu, uintptr_t ra, return; } } + trace_store_atom4_fallback(memop, ra); cpu_loop_exit_atomic(cpu, ra); default: g_assert_not_reached(); @@ -1029,6 +1036,7 @@ static void store_atom_8(CPUState *cpu, uintptr_t ra, default: g_assert_not_reached(); } + trace_store_atom8_fallback(memop, ra); cpu_loop_exit_atomic(cpu, ra); } @@ -1107,5 +1115,6 @@ static void store_atom_16(CPUState *cpu, uintptr_t ra, default: g_assert_not_reached(); } + trace_store_atom16_fallback(memop, ra); cpu_loop_exit_atomic(cpu, ra); } diff --git a/accel/tcg/trace-events b/accel/tcg/trace-events index 4e9b450520..14f638810c 100644 --- a/accel/tcg/trace-events +++ b/accel/tcg/trace-events @@ -12,3 +12,15 @@ memory_notdirty_set_dirty(uint64_t vaddr) "0x%" PRIx64 # translate-all.c translate_block(void *tb, uintptr_t pc, const void *tb_code) "tb:%p, pc:0x%"PRIxPTR", tb_code:%p" + +# ldst_atomicity +load_atom2_fallback(uint32_t memop, uintptr_t ra) "mop:0x%"PRIx32", ra:0x%"PRIxPTR"" +load_atom4_fallback(uint32_t memop, uintptr_t ra) "mop:0x%"PRIx32", ra:0x%"PRIxPTR"" +load_atom8_or_exit_fallback(uintptr_t ra) "ra:0x%"PRIxPTR"" +load_atom8_fallback(uint32_t memop, uintptr_t ra) "mop:0x%"PRIx32", ra:0x%"PRIxPTR"" +load_atom16_fallback(uint32_t memop, uintptr_t ra) "mop:0x%"PRIx32", ra:0x%"PRIxPTR"" +load_atom16_or_exit_fallback(uintptr_t ra) "ra:0x%"PRIxPTR"" +store_atom2_fallback(uint32_t memop, uintptr_t ra) "mop:0x%"PRIx32", ra:0x%"PRIxPTR"" +store_atom4_fallback(uint32_t memop, uintptr_t ra) "mop:0x%"PRIx32", ra:0x%"PRIxPTR"" +store_atom8_fallback(uint32_t memop, uintptr_t ra) "mop:0x%"PRIx32", ra:0x%"PRIxPTR"" +store_atom16_fallback(uint32_t memop, uintptr_t ra) "mop:0x%"PRIx32", ra:0x%"PRIxPTR""