From patchwork Thu Aug 4 09:20:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Alex_Benn=C3=A9e?= X-Patchwork-Id: 595320 Delivered-To: patch@linaro.org Received: by 2002:a05:7000:b345:0:0:0:0 with SMTP id w5csp265212maz; Thu, 4 Aug 2022 02:26:35 -0700 (PDT) X-Google-Smtp-Source: AA6agR5eW98tAcS+c6442pzl2qGeJNff8hoWCXSc3SCACu8bvZzm811CkT+LmFBaonvYiqsrXKRX X-Received: by 2002:a05:620a:78c:b0:6b4:7012:97bd with SMTP id 12-20020a05620a078c00b006b4701297bdmr585829qka.135.1659605195008; Thu, 04 Aug 2022 02:26:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659605195; cv=none; d=google.com; s=arc-20160816; b=0JEJEiuB0n35JOzAKuIrYcnGv85xy3cwa4tNihpOUFfdCmNyze95pnUXG63McvV6xB 8k3H1YyjjBGuaW635wONhxJ7DPOT89XXbeTw4k+Ce5I1a62jwrBe6RoVyyGQoLLysz1D P5T/Ty/HfIbRCH4Xnh/M5H/rHfAKEMQol/SpHLj/u+3mvLCiIIKsJrY5y88AVqcMpqby 2p6n2BjJdTVuzbSZ+oOkzDv4i5rns4JAiZpUnNx2reiT+GFynutVKNsm1Ri86HR08wk1 qg/ZonqXeaHHl1cFrhuvuahn1DyRabOM+3NYsvnqtpwaFBI4Mm4sp1/h+yXSNy8tIXrW lMxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature; bh=8ZbOjpO/dOgKS9JowVvhtj9YX3QSdVl1BNHpIbHjPH8=; b=WreWmc+csXJjodhj8ZH8IhdpS2chPLeAGpe0oCK7kHrX8K2pN5Lv09jk7iWZGSAu67 qBE0xXHsD08FBM89Q2D2+qexJ/S1LTgWJHIJqVoXLVnwiFVwxVTY8DmUk93mHtHE7owI 7XCGAExJpsx7zxEz0sFhjgbKRG4eb5pg7VPgYMAkokxaHcwQneESHOTONV6nKM/exbs5 Znq5d7XstBzLSkhrC1bxgf2YEJ5bYx0h8ru5OxKqMyu1L2oRNf7qLzmJaoJVHMV0dk9F +bFRIuHOokY7CqEtsR4lSFR4k8NxR/0oypaQ0pf1UX/G+LxPTlScg38TNqL5qwMNFUIT aVXw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=i4ih7E4F; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id fv14-20020a056214240e00b00477f08e305fsi179851qvb.126.2022.08.04.02.26.34 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Thu, 04 Aug 2022 02:26:34 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=i4ih7E4F; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:34436 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oJX7i-0005NL-A7 for patch@linaro.org; Thu, 04 Aug 2022 05:26:34 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:33546) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oJX2P-00045F-S3 for qemu-devel@nongnu.org; Thu, 04 Aug 2022 05:21:07 -0400 Received: from mail-wr1-x431.google.com ([2a00:1450:4864:20::431]:34478) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oJX2M-0004ey-1d for qemu-devel@nongnu.org; Thu, 04 Aug 2022 05:21:05 -0400 Received: by mail-wr1-x431.google.com with SMTP id j1so16140076wrw.1 for ; Thu, 04 Aug 2022 02:20:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc; bh=8ZbOjpO/dOgKS9JowVvhtj9YX3QSdVl1BNHpIbHjPH8=; b=i4ih7E4Fts8Ya1nwbNZi0hIF0jcwxyrelQkLr9OUYL2aMN1avnHhkGsC2ICpGl6riA AmE85JUOj2n0LKnT56VktKIerpW4BVsg1o1HkpVviUSgndclhlu+0q+zoiVuBKvWG1De HlWRt0mHVN61WaR2SwU4XSUVKKM/FGiFfYZdt3bRwhu56zegWSO1e2LPSg1uFZcGq6j+ 5Gz7klHiVbeYS1WlKz9MSPnRB8fFV41S8+cFZ82u7nEgBNysb9JYjxZsZX0QND4ukZ7U rO3yuoRP24BMui00u+MG3/xbQb5PbRYE/SasNXCuGhQdPtuX/TgB8tB5JSuja6sfKiDm shxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc; bh=8ZbOjpO/dOgKS9JowVvhtj9YX3QSdVl1BNHpIbHjPH8=; b=4cnVVFeRGxbraTKgSs273ChDbRqUEShEfuhixbBqyzFDZludU7cOczpvrnyCHkEa7a vrXtD4mQZ/9q1aoqe2sj1ob+e4pOWIQHbXr33uxxEOPa4VZv+yWRyYXQSTCvohVljZTO 0t8nH2y9JnZyhLSfpTNnCofeuauX87MTws8ExMHLBiVSUi6xpnC68Zu9yoTgQLVv4Px1 4e3U/UaX2bzTj5Ea9BFMOjXIl8uyGfI341hmYw4PbeFCwVLNhaSpf4d1uxx9wEFLhyBk yI9VjnCJO4Qlu2IUL/rXNFGHZ+AMld1mNCSlzL38QFo5zFw+vgZ8nK1MjReqy5yAuBYA k4Rg== X-Gm-Message-State: ACgBeo2Sqr/d3SxQIElhj1eYW2+wPNK++0bq5LM7vjr5ucChsDtPx/Q8 EZpjowyjU3uWLIVHWItXxRv9GQ== X-Received: by 2002:a05:6000:805:b0:220:748e:82c6 with SMTP id bt5-20020a056000080500b00220748e82c6mr751928wrb.395.1659604856781; Thu, 04 Aug 2022 02:20:56 -0700 (PDT) Received: from zen.linaroharston ([185.81.254.11]) by smtp.gmail.com with ESMTPSA id ba5-20020a0560001c0500b0021efc75914esm581776wrb.79.2022.08.04.02.20.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Aug 2022 02:20:55 -0700 (PDT) Received: from zen.lan (localhost [127.0.0.1]) by zen.linaroharston (Postfix) with ESMTP id 3CDC41FFB7; Thu, 4 Aug 2022 10:20:55 +0100 (BST) From: =?utf-8?q?Alex_Benn=C3=A9e?= To: qemu-devel@nongnu.org Cc: =?utf-8?q?Alex_Benn=C3=A9e?= , =?utf-8?q?C=C3=A9?= =?utf-8?q?dric_Le_Goater?= , Richard Henderson , Paolo Bonzini , Alistair Francis , Eduardo Habkost , Marcel Apfelbaum , =?utf-8?q?Philippe_Mathieu-D?= =?utf-8?q?aud=C3=A9?= , Yanan Wang , =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= Subject: [RFC PATCH] cputlb and ssi: cache class to avoid expensive object_dynamic_cast_assert (HACK!!!) Date: Thu, 4 Aug 2022 10:20:44 +0100 Message-Id: <20220804092044.2101093-1-alex.bennee@linaro.org> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Received-SPF: pass client-ip=2a00:1450:4864:20::431; envelope-from=alex.bennee@linaro.org; helo=mail-wr1-x431.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Investigating why some BMC models are so slow compared to a plain ARM virt machines I did some profiling of: ./qemu-system-arm -M romulus-bmc -nic user \ -drive file=obmc-phosphor-image-romulus.static.mtd,format=raw,if=mtd \ -nographic -serial mon:stdio And saw that object_dynamic_cast was dominating the profile times. We have a number of cases in the CPU hot path and more importantly for this model in the SSI bus. As the class is static once the object is created we just cache it and use it instead of the dynamic case macros. [AJB: I suspect a proper fix for this is for QOM to support a cached class lookup, abortive macro attempt #if 0'd in this patch]. Signed-off-by: Alex Bennée Cc: Cédric Le Goater --- include/hw/core/cpu.h | 2 ++ include/hw/ssi/ssi.h | 3 +++ include/qom/object.h | 29 +++++++++++++++++++++++++++++ accel/tcg/cputlb.c | 12 ++++++++---- hw/ssi/ssi.c | 10 +++++++--- 5 files changed, 49 insertions(+), 7 deletions(-) diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h index 500503da13..70027a772e 100644 --- a/include/hw/core/cpu.h +++ b/include/hw/core/cpu.h @@ -317,6 +317,8 @@ struct qemu_work_item; struct CPUState { /*< private >*/ DeviceState parent_obj; + /* cache to avoid expensive CPU_GET_CLASS */ + CPUClass *cc; /*< public >*/ int nr_cores; diff --git a/include/hw/ssi/ssi.h b/include/hw/ssi/ssi.h index f411858ab0..6950f86810 100644 --- a/include/hw/ssi/ssi.h +++ b/include/hw/ssi/ssi.h @@ -59,6 +59,9 @@ struct SSIPeripheralClass { struct SSIPeripheral { DeviceState parent_obj; + /* cache the class */ + SSIPeripheralClass *spc; + /* Chip select state */ bool cs; }; diff --git a/include/qom/object.h b/include/qom/object.h index ef7258a5e1..2202dbfa43 100644 --- a/include/qom/object.h +++ b/include/qom/object.h @@ -198,6 +198,35 @@ struct Object OBJ_NAME##_CLASS(const void *klass) \ { return OBJECT_CLASS_CHECK(ClassType, klass, TYPENAME); } +#if 0 +/** + * DECLARE_CACHED_CLASS_CHECKER: + * @InstanceType: instance struct name + * @ClassType: class struct name + * @OBJ_NAME: the object name in uppercase with underscore separators + * @TYPENAME: type name + * + * This variant of DECLARE_CLASS_CHECKERS allows for the caching of + * class in the parent object instance. This is useful for very hot + * path code at the expense of an extra indirection and check. As per + * the original direct usage of this macro should be avoided if the + * complete OBJECT_DECLARE_TYPE macro has been used. + * + * This macro will provide the class type cast functions for a + * QOM type. + */ +#define DECLARE_CACHED_CLASS_CHECKERS(InstanceType, ClassType, OBJ_NAME, TYPENAME) \ + DECLARE_CLASS_CHECKERS(ClassType, OBJ_NAME, TYPENAME) \ + static inline G_GNUC_UNUSED ClassType * \ + OBJ_NAME##_GET_CACHED_CLASS(const void *obj) \ + { \ + InstanceType *p = (InstanceType *) obj; \ + p->cc = p->cc ? p->cc : OBJECT_GET_CLASS(ClassType, obj, TYPENAME);\ + return p->cc; \ + } + +#endif + /** * DECLARE_OBJ_CHECKERS: * @InstanceType: instance struct name diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index a46f3a654d..882315f7dd 100644 --- a/accel/tcg/cputlb.c +++ b/accel/tcg/cputlb.c @@ -1303,8 +1303,9 @@ static inline ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr) static void tlb_fill(CPUState *cpu, target_ulong addr, int size, MMUAccessType access_type, int mmu_idx, uintptr_t retaddr) { - CPUClass *cc = CPU_GET_CLASS(cpu); + CPUClass *cc = cpu->cc ? cpu->cc : CPU_GET_CLASS(cpu); bool ok; + cpu->cc = cc; /* * This is not a probe, so only valid return is success; failure @@ -1319,7 +1320,8 @@ static inline void cpu_unaligned_access(CPUState *cpu, vaddr addr, MMUAccessType access_type, int mmu_idx, uintptr_t retaddr) { - CPUClass *cc = CPU_GET_CLASS(cpu); + CPUClass *cc = cpu->cc ? cpu->cc : CPU_GET_CLASS(cpu); + cpu->cc = cc; cc->tcg_ops->do_unaligned_access(cpu, addr, access_type, mmu_idx, retaddr); } @@ -1331,7 +1333,8 @@ static inline void cpu_transaction_failed(CPUState *cpu, hwaddr physaddr, MemTxResult response, uintptr_t retaddr) { - CPUClass *cc = CPU_GET_CLASS(cpu); + CPUClass *cc = cpu->cc ? cpu->cc : CPU_GET_CLASS(cpu); + cpu->cc = cc; if (!cpu->ignore_memory_transaction_failures && cc->tcg_ops->do_transaction_failed) { @@ -1606,7 +1609,8 @@ static int probe_access_internal(CPUArchState *env, target_ulong addr, if (!tlb_hit_page(tlb_addr, page_addr)) { if (!victim_tlb_hit(env, mmu_idx, index, elt_ofs, page_addr)) { CPUState *cs = env_cpu(env); - CPUClass *cc = CPU_GET_CLASS(cs); + CPUClass *cc = cs->cc ? cs->cc : CPU_GET_CLASS(cs); + cs->cc = cc; if (!cc->tcg_ops->tlb_fill(cs, addr, fault_size, access_type, mmu_idx, nonfault, retaddr)) { diff --git a/hw/ssi/ssi.c b/hw/ssi/ssi.c index 003931fb50..f749feb6e3 100644 --- a/hw/ssi/ssi.c +++ b/hw/ssi/ssi.c @@ -38,7 +38,8 @@ static void ssi_cs_default(void *opaque, int n, int level) bool cs = !!level; assert(n == 0); if (s->cs != cs) { - SSIPeripheralClass *ssc = SSI_PERIPHERAL_GET_CLASS(s); + /* SSIPeripheralClass *ssc = SSI_PERIPHERAL_GET_CLASS(s); */ + SSIPeripheralClass *ssc = s->spc; if (ssc->set_cs) { ssc->set_cs(s, cs); } @@ -48,7 +49,8 @@ static void ssi_cs_default(void *opaque, int n, int level) static uint32_t ssi_transfer_raw_default(SSIPeripheral *dev, uint32_t val) { - SSIPeripheralClass *ssc = SSI_PERIPHERAL_GET_CLASS(dev); + /* SSIPeripheralClass *ssc = SSI_PERIPHERAL_GET_CLASS(dev); */ + SSIPeripheralClass *ssc = dev->spc; if ((dev->cs && ssc->cs_polarity == SSI_CS_HIGH) || (!dev->cs && ssc->cs_polarity == SSI_CS_LOW) || @@ -67,6 +69,7 @@ static void ssi_peripheral_realize(DeviceState *dev, Error **errp) ssc->cs_polarity != SSI_CS_NONE) { qdev_init_gpio_in_named(dev, ssi_cs_default, SSI_GPIO_CS, 1); } + s->spc = ssc; ssc->realize(s, errp); } @@ -120,7 +123,8 @@ uint32_t ssi_transfer(SSIBus *bus, uint32_t val) QTAILQ_FOREACH(kid, &b->children, sibling) { SSIPeripheral *peripheral = SSI_PERIPHERAL(kid->child); - ssc = SSI_PERIPHERAL_GET_CLASS(peripheral); + /* ssc = SSI_PERIPHERAL_GET_CLASS(peripheral); */ + ssc = peripheral->spc; r |= ssc->transfer_raw(peripheral, val); }