From patchwork Thu Feb 15 08:14:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 772879 Delivered-To: patch@linaro.org Received: by 2002:adf:9dc2:0:b0:33b:4db1:f5b3 with SMTP id q2csp713320wre; Thu, 15 Feb 2024 00:16:47 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCVuy2FnGLFqmEiIy443A5jyzyE+y5iAc4iw8H1wR49T4pQY0HxAY+nsxq8XSu8WTfByjEnUShP0K0MPgWTRZD8F X-Google-Smtp-Source: AGHT+IHtdS263i+j1GTPqCrBUBRioOFH4cGdTvyYmOALrR52a9xX4NI71noBszHHNlotie1ImmmZ X-Received: by 2002:a05:622a:1d1:b0:42a:3176:6b10 with SMTP id t17-20020a05622a01d100b0042a31766b10mr1310025qtw.32.1707985006802; Thu, 15 Feb 2024 00:16:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1707985006; cv=none; d=google.com; s=arc-20160816; b=Zi7uIEbv50J8WUO19i5MyUgQdkcAP5OPiQw/rNZUz1tGc1CkK6duF3MVBfYOiHld8M pZbu4nr0fsWCpk50ejo/FfdHnaIOA+E/5FLprq3HjcT0KjHtKvlCD8N+U2XZ+UT5R9Ys Pqk3YugJosGcayfVig+odgDbK9blDyJROW8r2MQjJuz23lV3O/uZQDM2fplekVeTmyhk ftMUFJi/wJKeH/tsmK9DqwpIxQYxgro0wi9gI9Nptw7k2k65h6FnezDRpIY4+AaNokzL sH66HN0z8MqOobmhm2N2ZPwhW1zJAH1iYyhoJcg8qJ11w/LgxLh9o3DtaSZ5tWWhcWO4 n+MQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=r70v09Z1jYUdUmudZIEve8pIi40latYcxErAtIALfYA=; fh=bXD9UGo7d05OYjVg0qF2wA28p2gxs496M6ftxgeDKVY=; b=m+LCDtngAv8k55rk9g+MqVczogAD6sQrdu8Jksal41mQSzYxWvwwTL3c3vdoEKuMix Gxt4nWimDO+HKg/fSgolSRJOinI1gDbIxoSMOxOkZH6C4+D2v/JV4skoOXXg6OgBN351 ZZ3hxPvtdaSivdfa4iP20wyqAdCC/emnmg47k493ZkbhqZcXAAy5BbWqMtUELi7tqPh+ 1Be5Xmj6Omeo5b6AEhICvdKHdY2zxKfcbsNmi9tJPlqKDRFXr5bHxxMp928UadT3Pb/w UZM9uAmgnQd1qssbos4qN49JcnJvrqzY4J5SYOzfjFxmGSKfgzdEhR1pijY0p9i/ADei a0lA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=awE855qc; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id bl2-20020a05620a1a8200b00785500f6090si1042626qkb.724.2024.02.15.00.16.46 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Thu, 15 Feb 2024 00:16:46 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=awE855qc; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1raWu9-0007gs-7B; Thu, 15 Feb 2024 03:15:37 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1raWtU-00071U-B7 for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:14:57 -0500 Received: from mail-pj1-x1036.google.com ([2607:f8b0:4864:20::1036]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1raWtS-0001N9-QO for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:14:56 -0500 Received: by mail-pj1-x1036.google.com with SMTP id 98e67ed59e1d1-295c8b795e2so460106a91.0 for ; Thu, 15 Feb 2024 00:14:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707984893; x=1708589693; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=r70v09Z1jYUdUmudZIEve8pIi40latYcxErAtIALfYA=; b=awE855qcctlZwnzAHPxC0TDCZW+3k7eu29biOnePABpynBxx8W4+IkEmcfczHkP0/f 88UWPFms5GZbajhPIWwi+2anMUEPCgW+j3kjrBun6DebIE3/PHKHDX7Z0UmqpdnPtB2D LS8AB1ysQFKNp/AHEcxQDr1+pRRsY1g36jzsi5ImhAiQHGdUOHVyq2n9A6hRMDbTYSMr q//6QQMHkB2riWQES0YOLEztexu8ewEb+OY8Q4dBx9zIZzewX5lv0/nIWLWCbYSABis/ 4t9r0gtqfIVQ+DkhhDKN3Guj5IP2TFj+SdxlQEQNhqdH5RViX53C2ZtHj2qDG+ecLpfg Vcdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707984893; x=1708589693; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r70v09Z1jYUdUmudZIEve8pIi40latYcxErAtIALfYA=; b=hFqxsgEpKDueKx7FseqHYOi7l/M54ormWoR/qK9i0meA+pv69yESnfZCmtqDZG+bjA v4mnV2YFThISK2JNXrdFi8LraGqAATHgd37KIfLIFhmzGoncrOf3GPBVM16BLKYXx5De mazMCsV+e3PBZjF4IDhQBLhBYMC90LKbTpkuW+qkg2YDJY3771DyUJXtKYdMhXnkWsyn qlCjl1X9hkM2O/z8ZkqR5kb4/ELgKZr2MKwKNzQ8jArOTrcxoRRvcBvbye2iU20x+jvg 8qYYC1rwg3kKyFHSPWeDfVozD8hEEzks72hrZTKxzx49K7POe8tc4tQmVEmLzMoneFqj F1sQ== X-Gm-Message-State: AOJu0YxUbcvMvTwXLJRWzs2beUOiEblbDmXxdqDWdmR/XeYQLdShaMuM hp24M9B3JrW3JMMVPQKoJ2APZ1NiYgwAOZYCxb06x2NqCWBkY7FmPEHtkecvnQKG2SodH5K2aDX 0 X-Received: by 2002:a17:90b:4a02:b0:299:17a7:c443 with SMTP id kk2-20020a17090b4a0200b0029917a7c443mr274265pjb.32.1707984893328; Thu, 15 Feb 2024 00:14:53 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id qc14-20020a17090b288e00b0029900404e11sm807755pjb.27.2024.02.15.00.14.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Feb 2024 00:14:52 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v4 01/10] util/bufferiszero: Remove SSE4.1 variant Date: Wed, 14 Feb 2024 22:14:40 -1000 Message-Id: <20240215081449.848220-2-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240215081449.848220-1-richard.henderson@linaro.org> References: <20240215081449.848220-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::1036; envelope-from=richard.henderson@linaro.org; helo=mail-pj1-x1036.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov The SSE4.1 variant is virtually identical to the SSE2 variant, except for using 'PTEST+JNZ' in place of 'PCMPEQB+PMOVMSKB+CMP+JNE' for testing if an SSE register is all zeroes. The PTEST instruction decodes to two uops, so it can be handled only by the complex decoder, and since CMP+JNE are macro-fused, both sequences decode to three uops. The uops comprising the PTEST instruction dispatch to p0 and p5 on Intel CPUs, so PCMPEQB+PMOVMSKB is comparatively more flexible from dispatch standpoint. Hence, the use of PTEST brings no benefit from throughput standpoint. Its latency is not important, since it feeds only a conditional jump, which terminates the dependency chain. I never observed PTEST variants to be faster on real hardware. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-2-amonakov@ispras.ru> --- util/bufferiszero.c | 29 ----------------------------- 1 file changed, 29 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 3e6a5dfd63..f5a3634f9a 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -100,34 +100,6 @@ buffer_zero_sse2(const void *buf, size_t len) } #ifdef CONFIG_AVX2_OPT -static bool __attribute__((target("sse4"))) -buffer_zero_sse4(const void *buf, size_t len) -{ - __m128i t = _mm_loadu_si128(buf); - __m128i *p = (__m128i *)(((uintptr_t)buf + 5 * 16) & -16); - __m128i *e = (__m128i *)(((uintptr_t)buf + len) & -16); - - /* Loop over 16-byte aligned blocks of 64. */ - while (likely(p <= e)) { - __builtin_prefetch(p); - if (unlikely(!_mm_testz_si128(t, t))) { - return false; - } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } - - /* Finish the aligned tail. */ - t |= e[-3]; - t |= e[-2]; - t |= e[-1]; - - /* Finish the unaligned tail. */ - t |= _mm_loadu_si128(buf + len - 16); - - return _mm_testz_si128(t, t); -} - static bool __attribute__((target("avx2"))) buffer_zero_avx2(const void *buf, size_t len) { @@ -221,7 +193,6 @@ select_accel_cpuinfo(unsigned info) #endif #ifdef CONFIG_AVX2_OPT { CPUINFO_AVX2, 128, buffer_zero_avx2 }, - { CPUINFO_SSE4, 64, buffer_zero_sse4 }, #endif { CPUINFO_SSE2, 64, buffer_zero_sse2 }, { CPUINFO_ALWAYS, 0, buffer_zero_int }, From patchwork Thu Feb 15 08:14:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 772883 Delivered-To: patch@linaro.org Received: by 2002:adf:9dc2:0:b0:33b:4db1:f5b3 with SMTP id q2csp713462wre; Thu, 15 Feb 2024 00:17:19 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCUGmCj/m7LaDw3Xav35tLLx3XR//UOKe/1sVEaisiGI5rkB34LuDf9FaVFG2Ke4hyiMfoixga0RMXnUa2EWAkk8 X-Google-Smtp-Source: AGHT+IF6sINpowfj/PgJyXrxfP/C+ILLdjHxcLy0lQGCUXvBHZhAn/DL/uO+h3FIQzhHaEmJZbA0 X-Received: by 2002:a0c:f5c3:0:b0:68c:8208:cf71 with SMTP id q3-20020a0cf5c3000000b0068c8208cf71mr1441182qvm.31.1707985039388; Thu, 15 Feb 2024 00:17:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1707985039; cv=none; d=google.com; s=arc-20160816; b=vQt08iAjuzPhYbF+kNfePtonlobu6K8dCmjwniz48qSztsAcDywbmqvCcKsEi3sKh7 RGh4QKV8AoUpptgvVepSpCH68pRBwdDzy8APeh4pRj8Lvu1VesZ83wJ0xHfrO/y6XVD2 SJ3Oh6ULSIL4N3Ko5232NAuL3R83LrJGlaccqFz1/5oA0AdYZRinwOh9XM3hF2h3WqI1 aJMyzoU6n+xFMQRBlwmBBpLiibTYgfMfVxt2KklT4pZLXnjDO0rXLMbivvweRHZXYRlA xy61CW6a2CWjA1ifLh39AhKFtg8HLgY3zbTM4a/fwFTUeGCFys3DH5pyk4e9jCz5b/5U Cg9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=Sc6Au+pExfgVbkEVt3xBcrgjk6DJCZWLxXzjjgNJTcI=; fh=bXD9UGo7d05OYjVg0qF2wA28p2gxs496M6ftxgeDKVY=; b=FZEi/5xOp47slvhNJlppAWp41rVHAb/rr+Fz21xwhNmck/dwzz1v0zhrPX7xNYIo/C 79xx4nqV5OtKZa7cf9/i8Sy+4O9hcVuEAb95TxhmyAUZHvi8AhvO7WcBRkaaPpd9/+Bm 8q81b8f+U4pndHUhiPCin+hnW0UPU3m7N1YsY0IgghiYmQ4DnJj1Dk5ws+jzE9wJnpH+ x5gwinp9ypwGeM9AsnQ9m4NOKyKILpJ4CEu2hWf57UcxSdgb5dtUowSbL1EIVb9qs/Bz L888tkq5QdUILPEfKXMLRxUou+N4a2Q7119xj3tc2vMuXZjppamwvpNn/lbK1Y4mT+Pe BIIg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=WwqMPmy6; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id k3-20020ad45be3000000b0068c51fc0a0asi956048qvc.292.2024.02.15.00.17.19 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Thu, 15 Feb 2024 00:17:19 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=WwqMPmy6; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1raWu8-0007fJ-RD; Thu, 15 Feb 2024 03:15:36 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1raWtV-00072x-P3 for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:14:58 -0500 Received: from mail-pj1-x102c.google.com ([2607:f8b0:4864:20::102c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1raWtU-0001NL-2d for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:14:57 -0500 Received: by mail-pj1-x102c.google.com with SMTP id 98e67ed59e1d1-290ec261a61so409173a91.0 for ; Thu, 15 Feb 2024 00:14:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707984894; x=1708589694; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Sc6Au+pExfgVbkEVt3xBcrgjk6DJCZWLxXzjjgNJTcI=; b=WwqMPmy6Kr5IbEMgHOJ3D+901EKZpQdAzh2H/J1P2EwG4jdtru6x6hlWi5hiZa+UAZ KW0M2K8zLkYXAhmUcOrnVl5cg4iVHm5JDmE3ATlju4SN1xoAIVxfLe9WdDYgQ3RaxRMs /yDkOIX6uYgd+1iIsaiV6x1UpwbAbjdifyYrhtNkvkxkxWMMZJpVJKfzZuLQWFI4ni1s jTlTM2Fl47M9eaPVudQISbc/w6DovQYuSe6WxQYFYBG6iXs6s0zs/cP1ZKqZzVFfYurp YgSNbB+LPGiwjOwDutRxX8+tFwVGYdnVmrCBbn6zJcu1l+8IfCZwXv7U/RIIkPaVGllJ GLrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707984894; x=1708589694; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Sc6Au+pExfgVbkEVt3xBcrgjk6DJCZWLxXzjjgNJTcI=; b=B24Prdun6mjGAKNSKOeE9ZjTID6so6IF3thX9XrTBjiv5JDrA89xmowSdNE+zNYYWB nLa2LQOlli6jzl0dSwe3ke96qRCjIUfUyfmyo/+8rMBIoHjthP+QEtPVujltxebfCjsM ZThPMA/hv7Bo6hrlEKTJZDXS2h+4bGwJlCmfC+rPIEJ8oD3xiuZoF5SSl+1WkC4+054C zQYKE+NQUvNlBrUSWnIvw4zeoDRy/SX65CqAC1m40lsZRMn0/QUNpbLxPJhNipFDZhs8 +H7GyM6HkVu6gD1Pkkp9ciqQfeKVJio72ZHOQOdY6eiF7R6mdRR/NyU+jqUwiBbMsFaG E7xw== X-Gm-Message-State: AOJu0YxYdSXfEKnJ9ykRFffNTqtUW1bWsNg7hu8AciIYYFNjRI0CQGxX W4tS8D6LDfrsuPHQyZ5602XaeBMViD+hufAwZ1h10ibIzv7wa6AMin7RJeexW+3S8pbCipWcNNL b X-Received: by 2002:a17:90b:1e01:b0:298:e10b:1776 with SMTP id pg1-20020a17090b1e0100b00298e10b1776mr1136813pjb.8.1707984894651; Thu, 15 Feb 2024 00:14:54 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id qc14-20020a17090b288e00b0029900404e11sm807755pjb.27.2024.02.15.00.14.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Feb 2024 00:14:54 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v4 02/10] util/bufferiszero: Remove AVX512 variant Date: Wed, 14 Feb 2024 22:14:41 -1000 Message-Id: <20240215081449.848220-3-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240215081449.848220-1-richard.henderson@linaro.org> References: <20240215081449.848220-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::102c; envelope-from=richard.henderson@linaro.org; helo=mail-pj1-x102c.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov Thanks to early checks in the inline buffer_is_zero wrapper, the SIMD routines are invoked much more rarely in normal use when most buffers are non-zero. This makes use of AVX512 unprofitable, as it incurs extra frequency and voltage transition periods during which the CPU operates at reduced performance, as described in https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html Signed-off-by: Mikhail Romanov Signed-off-by: Alexander Monakov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-4-amonakov@ispras.ru> Signed-off-by: Richard Henderson --- util/bufferiszero.c | 38 +++----------------------------------- 1 file changed, 3 insertions(+), 35 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index f5a3634f9a..641d5f9b9e 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -64,7 +64,7 @@ buffer_zero_int(const void *buf, size_t len) } } -#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) || defined(__SSE2__) +#if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) #include /* Note that each of these vectorized functions require len >= 64. */ @@ -128,41 +128,12 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ -#ifdef CONFIG_AVX512F_OPT -static bool __attribute__((target("avx512f"))) -buffer_zero_avx512(const void *buf, size_t len) -{ - /* Begin with an unaligned head of 64 bytes. */ - __m512i t = _mm512_loadu_si512(buf); - __m512i *p = (__m512i *)(((uintptr_t)buf + 5 * 64) & -64); - __m512i *e = (__m512i *)(((uintptr_t)buf + len) & -64); - - /* Loop over 64-byte aligned blocks of 256. */ - while (p <= e) { - __builtin_prefetch(p); - if (unlikely(_mm512_test_epi64_mask(t, t))) { - return false; - } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } - - t |= _mm512_loadu_si512(buf + len - 4 * 64); - t |= _mm512_loadu_si512(buf + len - 3 * 64); - t |= _mm512_loadu_si512(buf + len - 2 * 64); - t |= _mm512_loadu_si512(buf + len - 1 * 64); - - return !_mm512_test_epi64_mask(t, t); - -} -#endif /* CONFIG_AVX512F_OPT */ - /* * Make sure that these variables are appropriately initialized when * SSE2 is enabled on the compiler command-line, but the compiler is * too old to support CONFIG_AVX2_OPT. */ -#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) +#if defined(CONFIG_AVX2_OPT) # define INIT_USED 0 # define INIT_LENGTH 0 # define INIT_ACCEL buffer_zero_int @@ -188,9 +159,6 @@ select_accel_cpuinfo(unsigned info) unsigned len; bool (*fn)(const void *, size_t); } all[] = { -#ifdef CONFIG_AVX512F_OPT - { CPUINFO_AVX512F, 256, buffer_zero_avx512 }, -#endif #ifdef CONFIG_AVX2_OPT { CPUINFO_AVX2, 128, buffer_zero_avx2 }, #endif @@ -208,7 +176,7 @@ select_accel_cpuinfo(unsigned info) return 0; } -#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) +#if defined(CONFIG_AVX2_OPT) static void __attribute__((constructor)) init_accel(void) { used_accel = select_accel_cpuinfo(cpuinfo_init()); From patchwork Thu Feb 15 08:14:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 772878 Delivered-To: patch@linaro.org Received: by 2002:adf:9dc2:0:b0:33b:4db1:f5b3 with SMTP id q2csp713266wre; Thu, 15 Feb 2024 00:16:35 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCUwL7aIv2isY5zmChy6PFzL72Bmr3KHkHvc6DDB8SQmzbBnhFXbTlxBsjSZxCzXbugPaAGBzwRk7ytuiNgXT2dl X-Google-Smtp-Source: AGHT+IE0ymUOgYslZ1ylsS69s9fp61t80O8qqbwo+NL+KaCa1C+URZl7Zlml9cBGwAiQzNkQjMMQ X-Received: by 2002:a0c:9d43:0:b0:68f:11ce:7ad6 with SMTP id n3-20020a0c9d43000000b0068f11ce7ad6mr960772qvf.20.1707984995127; Thu, 15 Feb 2024 00:16:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1707984995; cv=none; d=google.com; s=arc-20160816; b=w0jvHnh0e0WNUR3MQL81joX0zKN0zcFilOHbOQgF6lPHr78A0/gQGCHdoo21JjGlE1 nJ81OvjIR7v9M0qHfWJ5eVVdS7Kubx32PQ3aKI+DLZo5ULM44y40IUzPUmn5IDZBzTlh 3ThB7aLF3brT+iLUkrPMelPovCuiJvxkSz4HnJWSoIRgStzvSJnnhZIBHclgu0nzNCJE WU4nrvEDdif+apc15t5goaPJ99GIW1yHpoJNTGif2Bk96C+kSz2+bXIHtiNXTjjt0hA/ +JWeA6GPx8i9AaSSluX8thBxYWDqHEbO8OQTPqVfLZ2gRIFX6B5qKcpljUDRJsrOOQaK qDtQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=dXCRLUcCrBeHTOJN8WRLVbCeiDdcG1rvhb6HzJR+Pi4=; fh=bXD9UGo7d05OYjVg0qF2wA28p2gxs496M6ftxgeDKVY=; b=y+D5eI9ectoLKA6pKfCSEl2E1UQCHyeYWD2XkZYLgHu6F34QoGtoChA51+bEHNxYUx SCCsDSXApuJMgARv+cHn20gLQGV+Z1orfRV6VKeUTxP7p27MUncn76ltInz/7wHdAM1s 4F/ksgOj2TxhPzwvc/w8Na6EtUl+QttbfCkX0cpQQiA/YwkjYC0GW+880ijU1o1v/c8w myd9R6IZWW70OlLzr9VFwQOoDlxlWb7YpKIWLoIcS90RPK9v6Enlaadl/djNZJZogT+J qf3HsZXKxe0OGW609XZY+zGOD2VILbJTv5lwEAGd0n8iU5DxFMtFyg68eGfk1eVhtuJB Hr/Q==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="fN/YWgu4"; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id fn11-20020ad45d6b000000b0068ca70ec311si966347qvb.167.2024.02.15.00.16.34 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Thu, 15 Feb 2024 00:16:35 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="fN/YWgu4"; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1raWu9-0007iw-U3; Thu, 15 Feb 2024 03:15:37 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1raWtX-00076g-9n for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:14:59 -0500 Received: from mail-pg1-x52d.google.com ([2607:f8b0:4864:20::52d]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1raWtV-0001Ne-CU for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:14:59 -0500 Received: by mail-pg1-x52d.google.com with SMTP id 41be03b00d2f7-53fbf2c42bfso488195a12.3 for ; Thu, 15 Feb 2024 00:14:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707984896; x=1708589696; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dXCRLUcCrBeHTOJN8WRLVbCeiDdcG1rvhb6HzJR+Pi4=; b=fN/YWgu4I2uogIlX4DZDGhuCci0c2r3g076Fp/U8f/yy+1HZnu0S6G7JQ7ZxtkY64v ZGsCpbjyRpFrRW8qH50UatrUfjcVGZvyl0iQgb3uYlfTy9KQ2TuwX5W97NhguO9DT31I 6aI3nm//CMxpOkHH69OWK7CvlziPMOliTX/R7RXaa46K65NofHJzjN5BCAMDIKdaIqVh jPuJn7ALwi2rS5rPdmVvhUjay1tSCZtKPY4Gekg04Rz+c9W1maGWiKXSYObyai5aL7Lw HzMa2Km3rAQhMFsVluMeNCSxi4xFpJYcKh0NpgNt/EjXuCYCXbZbxlWB12kBcvyu0lKl tf4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707984896; x=1708589696; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dXCRLUcCrBeHTOJN8WRLVbCeiDdcG1rvhb6HzJR+Pi4=; b=czfzXTCvINMkJ1BJc7gjJD6g3IyPxrjYcpyjdjN4klvhMcq/DMJyoUqeYXrbLFJjbT 7TrjikHw7cmjeS66OSHOq7mr40lC+a0Q9CK6IOiXqM7EEflzs4ZDktQXdnEVPN11HQjk gpEynIVeOYcZsVHn/KRAz5sZ05seFWLz2FoqG8SzPSuSYSCW+1B0fkZssF4tZwXtUM+N MdaufpL2bfHQ3oO36r+7tTf5gEWrBYHhuszEx2ZSqKYAfi7alGi14+1WZ22DJcoVVgGW nNck4zWIgMI7UWtj2meU2DVgEcc9DSRym09U9Awdvt3S3Yj1Qnf5htVx4dVk1u8dCyRW 0LQw== X-Gm-Message-State: AOJu0Yygc7TxkqkDgufih2J1Us3sUqSvvcRY7KpdR7S0VNXI/31pli+M 1jLkTbrdfu0s7CqOXsAJlowgnoeutTi9c7QzuQBOjIKEhIKBHJ/OWEPludIAjHhKNcy5X4jlVdZ h X-Received: by 2002:a05:6a20:a195:b0:19e:425e:ec56 with SMTP id r21-20020a056a20a19500b0019e425eec56mr980332pzk.24.1707984895996; Thu, 15 Feb 2024 00:14:55 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id qc14-20020a17090b288e00b0029900404e11sm807755pjb.27.2024.02.15.00.14.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Feb 2024 00:14:55 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v4 03/10] util/bufferiszero: Reorganize for early test for acceleration Date: Wed, 14 Feb 2024 22:14:42 -1000 Message-Id: <20240215081449.848220-4-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240215081449.848220-1-richard.henderson@linaro.org> References: <20240215081449.848220-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::52d; envelope-from=richard.henderson@linaro.org; helo=mail-pg1-x52d.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov Test for length >= 256 inline, where is is often a constant. Before calling into the accelerated routine, sample three bytes from the buffer, which handles most non-zero buffers. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Message-Id: <20240206204809.9859-3-amonakov@ispras.ru> [rth: Use __builtin_constant_p and perform the sample out-of-line.] Signed-off-by: Richard Henderson --- include/qemu/cutils.h | 15 +++++++- util/bufferiszero.c | 89 ++++++++++++++++++------------------------- 2 files changed, 51 insertions(+), 53 deletions(-) diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h index 92c927a6a3..36f8cfa0e9 100644 --- a/include/qemu/cutils.h +++ b/include/qemu/cutils.h @@ -187,9 +187,22 @@ char *freq_to_str(uint64_t freq_hz); /* used to print char* safely */ #define STR_OR_NULL(str) ((str) ? (str) : "null") -bool buffer_is_zero(const void *buf, size_t len); +/* + * Check if a buffer is all zeroes. + */ + +bool buffer_is_zero_ool(const void *vbuf, size_t len); +bool buffer_is_zero_ge256(const void *vbuf, size_t len); bool test_buffer_is_zero_next_accel(void); +#ifdef __OPTIMIZE__ +#define buffer_is_zero(B, L) \ + (__builtin_constant_p(L) && (size_t)(L) >= 256 \ + ? buffer_is_zero_ge256(B, L) : buffer_is_zero_ool(B, L)) +#else +#define buffer_is_zero buffer_is_zero_ool +#endif + /* * Implementation of ULEB128 (http://en.wikipedia.org/wiki/LEB128) * Input is limited to 14-bit numbers diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 641d5f9b9e..38527f2467 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -26,8 +26,9 @@ #include "qemu/bswap.h" #include "host/cpuinfo.h" -static bool -buffer_zero_int(const void *buf, size_t len) +static bool (*buffer_is_zero_accel)(const void *, size_t); + +static bool buffer_is_zero_integer(const void *buf, size_t len) { if (unlikely(len < 8)) { /* For a very small buffer, simply accumulate all the bytes. */ @@ -128,60 +129,38 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ -/* - * Make sure that these variables are appropriately initialized when - * SSE2 is enabled on the compiler command-line, but the compiler is - * too old to support CONFIG_AVX2_OPT. - */ -#if defined(CONFIG_AVX2_OPT) -# define INIT_USED 0 -# define INIT_LENGTH 0 -# define INIT_ACCEL buffer_zero_int -#else -# ifndef __SSE2__ -# error "ISA selection confusion" -# endif -# define INIT_USED CPUINFO_SSE2 -# define INIT_LENGTH 64 -# define INIT_ACCEL buffer_zero_sse2 -#endif - -static unsigned used_accel = INIT_USED; -static unsigned length_to_accel = INIT_LENGTH; -static bool (*buffer_accel)(const void *, size_t) = INIT_ACCEL; - static unsigned __attribute__((noinline)) select_accel_cpuinfo(unsigned info) { /* Array is sorted in order of algorithm preference. */ static const struct { unsigned bit; - unsigned len; bool (*fn)(const void *, size_t); } all[] = { #ifdef CONFIG_AVX2_OPT - { CPUINFO_AVX2, 128, buffer_zero_avx2 }, + { CPUINFO_AVX2, buffer_zero_avx2 }, #endif - { CPUINFO_SSE2, 64, buffer_zero_sse2 }, - { CPUINFO_ALWAYS, 0, buffer_zero_int }, + { CPUINFO_SSE2, buffer_zero_sse2 }, + { CPUINFO_ALWAYS, buffer_is_zero_integer }, }; for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { if (info & all[i].bit) { - length_to_accel = all[i].len; - buffer_accel = all[i].fn; + buffer_is_zero_accel = all[i].fn; return all[i].bit; } } return 0; } -#if defined(CONFIG_AVX2_OPT) +static unsigned used_accel; + static void __attribute__((constructor)) init_accel(void) { used_accel = select_accel_cpuinfo(cpuinfo_init()); } -#endif /* CONFIG_AVX2_OPT */ + +#define INIT_ACCEL NULL bool test_buffer_is_zero_next_accel(void) { @@ -194,36 +173,42 @@ bool test_buffer_is_zero_next_accel(void) used_accel |= used; return used; } - -static bool select_accel_fn(const void *buf, size_t len) -{ - if (likely(len >= length_to_accel)) { - return buffer_accel(buf, len); - } - return buffer_zero_int(buf, len); -} - #else -#define select_accel_fn buffer_zero_int bool test_buffer_is_zero_next_accel(void) { return false; } + +#define INIT_ACCEL buffer_is_zero_integer #endif -/* - * Checks if a buffer is all zeroes - */ -bool buffer_is_zero(const void *buf, size_t len) +static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL; + +static inline bool buffer_is_zero_sample3(const char *buf, size_t len) +{ + return (buf[0] | buf[len - 1] | buf[len / 2]) == 0; +} + +bool buffer_is_zero_ool(const void *buf, size_t len) { if (unlikely(len == 0)) { return true; } + if (!buffer_is_zero_sample3(buf, len)) { + return false; + } + /* All bytes are covered for any len <= 3. */ + if (unlikely(len <= 3)) { + return true; + } - /* Fetch the beginning of the buffer while we select the accelerator. */ - __builtin_prefetch(buf); - - /* Use an optimized zero check if possible. Note that this also - includes a check for an unrolled loop over 64-bit integers. */ - return select_accel_fn(buf, len); + if (likely(len >= 256)) { + return buffer_is_zero_accel(buf, len); + } + return buffer_is_zero_integer(buf, len); +} + +bool buffer_is_zero_ge256(const void *buf, size_t len) +{ + return buffer_is_zero_sample3(buf, len) && buffer_is_zero_accel(buf, len); } From patchwork Thu Feb 15 08:14:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 772880 Delivered-To: patch@linaro.org Received: by 2002:adf:9dc2:0:b0:33b:4db1:f5b3 with SMTP id q2csp713343wre; Thu, 15 Feb 2024 00:16:52 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCVVH8FEUDIvPxFG+XxjBVmfyZkwTLrLphQEkjD0IYgrki1TNTEo20bLHPthDLnQZngVjjYEQhi3Uw4ScYTpeWVa X-Google-Smtp-Source: AGHT+IGQOj/HrpA82qYrmFX53S+D/7Hoj7HjHArHJaru12To6piAHAuZFXdRRowAph4oNrmEJHQ+ X-Received: by 2002:a05:620a:b1b:b0:785:ca4d:2a3a with SMTP id t27-20020a05620a0b1b00b00785ca4d2a3amr1069016qkg.77.1707985011825; Thu, 15 Feb 2024 00:16:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1707985011; cv=none; d=google.com; s=arc-20160816; b=f02jFGGWiE1vj+TyeglUKNQ2PR99zfQHyjjx2iNZAarG03oG442pidxC3HjogJa5fy 6/ucCjg3vQZjX4tvENUNmxYK3Lm2od0boT6NQ/TKput7jPr+0sCClVewZkwvKGuxsi6N fJe7VvsMtIk93h+qcjRttV/kQPTjGTaJaTzbXh44vAqGJGEfOCzyo3IBIfH4fuhAX6PF vsYu+Y1C4zYiachESAj+2cLMIDJFUtGoFy6ZLlKtVfe1tImc87GzJleGPLG4PgoYNTqc bYAyjP6zvh2L5RIjT2O1DTTZkEMH99Wcs37gMsGfM610/6d0Xzi/hxVTNIHGElUxN1FE UEzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=VgkwjOwkXeKSpsaCNpWivSMzmaRHOTSXYP4mzjDgWYo=; fh=bXD9UGo7d05OYjVg0qF2wA28p2gxs496M6ftxgeDKVY=; b=QcfVpngC5MQ3MvSchNq8J2ExdAzQPihr8IFxhj3vwZaaLH7LO1ZpN8uyFwJwOjRgZ2 w/2hUPILtNgRrGGgfVv1Zy2GykCYMhsh93C37yTnrqfpurGfbsk/PtVQhhLRWfucWmLY YW42qRhxsR9pHKXYmulFW39Gu8ys5yCECV4tOStTXDp3hMXt97Z6kpHdw5asIY7JX581 hTYB0+UQDhmwBPi+YGRSOF2ubZYWIj+Nz2iqYl1GMpH91QtYLS5FbHedY3VUyDE+jJjs DE2Sy+E6aZ0Pxi/NEfvZPOMCKkfYilZ1NrU23jHqb0fhQZ/FFfXMLCJrdiutUWGzdTEK qaVQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=U3PDUSXX; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id bk8-20020a05620a1a0800b00787271f2b1bsi1065841qkb.106.2024.02.15.00.16.51 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Thu, 15 Feb 2024 00:16:51 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=U3PDUSXX; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1raWu7-0007b3-R9; Thu, 15 Feb 2024 03:15:35 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1raWtZ-0007BA-0Y for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:02 -0500 Received: from mail-pj1-x102d.google.com ([2607:f8b0:4864:20::102d]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1raWtX-0001No-El for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:00 -0500 Received: by mail-pj1-x102d.google.com with SMTP id 98e67ed59e1d1-296c562ac70so525664a91.2 for ; Thu, 15 Feb 2024 00:14:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707984897; x=1708589697; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VgkwjOwkXeKSpsaCNpWivSMzmaRHOTSXYP4mzjDgWYo=; b=U3PDUSXX7ASKeyKrx7u2i7lRXYRDOkmyKhP4s9QRep0cTnZL+E636VURS7wO/NX4hh 0ri8EghIHBvrPB1GL2+f2aHgMK6BVmGsuo/4H1AOp7lktXiGmHX7egobiw1IC0/zxVut V2nRCbzKv1zM7eKX6wqukGffnYDI+ch+fnn643/+9UXTN4T6bcbYb45UHrAMqqCMvYN5 H97dcecGvX6wXFsCVLTTsxHEjWPcekit4zGhkXMhsDSCzgNP75rFJ3vKyt7ZVUwTeUCi Eybk4RilIwzFeT1W/s9J63ae26lKfIiaqdceAn3l+aLIYAu51rQUgDquDNZqFu8BOPQq C26A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707984897; x=1708589697; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VgkwjOwkXeKSpsaCNpWivSMzmaRHOTSXYP4mzjDgWYo=; b=YUwPqN1Ass3R1dCkQs3zo7fyeVsx2rA6GqJltHNnJs3zAapSCabl1WF6mHu9RTQszT tNgWIifFZckrmn0jiIGWTPGeWonJ8Z2apfTTyA+R2k23ckTIHDqF/Tjmnc+egwWBXfhG bCvNRTqkYFXpW71Zu6W4O5wWY2ZoujrNl7ut8uXC/CCFJDL4/leAghZJGhTEB+2Ir8+w pXaziebKgltMMzqeTxrhT36y5Tbrf8EIIjM8tQBpftm+PbaJSxzQojPMWLYnyPDTc1vV phGRQle9ORLKOq4gv2XscJYhpt50VCbpAKGlKZpkrMY0GFJhzCBJiq+sHQO5pUoeINMR qnJg== X-Gm-Message-State: AOJu0YzFE/O1K7njPuqEd11dONRXa2Uu8FRcHgZoEmXBKCv5fjvgIs+p Ld6Dnip9oly7O7mevqeZh/XWgVak8t89JUpnSXPEz/tpl9myxZgIq78hlEQJ7K6RoK5gBGZMRwE n X-Received: by 2002:a17:90a:4381:b0:298:c136:2ffc with SMTP id r1-20020a17090a438100b00298c1362ffcmr860621pjg.45.1707984897218; Thu, 15 Feb 2024 00:14:57 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id qc14-20020a17090b288e00b0029900404e11sm807755pjb.27.2024.02.15.00.14.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Feb 2024 00:14:56 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v4 04/10] util/bufferiszero: Remove useless prefetches Date: Wed, 14 Feb 2024 22:14:43 -1000 Message-Id: <20240215081449.848220-5-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240215081449.848220-1-richard.henderson@linaro.org> References: <20240215081449.848220-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::102d; envelope-from=richard.henderson@linaro.org; helo=mail-pj1-x102d.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov Use of prefetching in bufferiszero.c is quite questionable: - prefetches are issued just a few CPU cycles before the corresponding line would be hit by demand loads; - they are done for simple access patterns, i.e. where hardware prefetchers can perform better; - they compete for load ports in loops that should be limited by load port throughput rather than ALU throughput. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-5-amonakov@ispras.ru> --- util/bufferiszero.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 38527f2467..6ef5f8ec79 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -50,7 +50,6 @@ static bool buffer_is_zero_integer(const void *buf, size_t len) const uint64_t *e = (uint64_t *)(((uintptr_t)buf + len) & -8); for (; p + 8 <= e; p += 8) { - __builtin_prefetch(p + 8); if (t) { return false; } @@ -80,7 +79,6 @@ buffer_zero_sse2(const void *buf, size_t len) /* Loop over 16-byte aligned blocks of 64. */ while (likely(p <= e)) { - __builtin_prefetch(p); t = _mm_cmpeq_epi8(t, zero); if (unlikely(_mm_movemask_epi8(t) != 0xFFFF)) { return false; @@ -111,7 +109,6 @@ buffer_zero_avx2(const void *buf, size_t len) /* Loop over 32-byte aligned blocks of 128. */ while (p <= e) { - __builtin_prefetch(p); if (unlikely(!_mm256_testz_si256(t, t))) { return false; } From patchwork Thu Feb 15 08:14:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 772884 Delivered-To: patch@linaro.org Received: by 2002:adf:9dc2:0:b0:33b:4db1:f5b3 with SMTP id q2csp713466wre; Thu, 15 Feb 2024 00:17:19 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCVmP5DqgeR5/5yU4VIuQhYb9hsAktA8Qx2fLi2rsjC+9yh98tOQgYnY+VwLGI1hA94SKqc0aOiU/D2aAIAoLneK X-Google-Smtp-Source: AGHT+IFAW1Lo31KSihN9nzgFDpCCFwGAk+21wIup/+i95m9vDmsBlDxtFy6awtz36xV1P+NqSKHa X-Received: by 2002:a05:6214:5190:b0:68e:e76a:744 with SMTP id kl16-20020a056214519000b0068ee76a0744mr1315074qvb.31.1707985039760; Thu, 15 Feb 2024 00:17:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1707985039; cv=none; d=google.com; s=arc-20160816; b=SghBitatVx3YXyM9Trho525LTdcynz8qsi00PSHIhfKYnQSE9P4TIQF9ImIvuoPzkF gMIOxySb/fysfvqjNVBBknlqeyWzia7cwxPQyQEt5khjqkZ/rHGBeOpERYwayeSds0Hj WrW0vGUf9R51mmKjAFYF8SrbZs4qs54aLbc2Cnt7TffDrFxpkReX5pPHc4rIXohpF1eu A0HK1fP3A8BUf5JdzzvZn3469fuqDdIbrrRhVBi+toeTDEkQNBrOfqfms33H46Q/lKYn okTgplQUaq0KokogsUlGEfWH1E3yC4ptegdWxLaM53r6EQuPuMDPqEqpuxkrYi+2Z7ti Hgvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=o1WrtT24LEWJ2UE0G4+JsaaZMny8aXncXNU55PndN9g=; fh=bXD9UGo7d05OYjVg0qF2wA28p2gxs496M6ftxgeDKVY=; b=RNpg0hw/82CW7BQ3JVO13DnxMLigTJoiD53XFbX6KyVvlrvD9WGJyCwQQXbzuyI7gK Y6O6qWLnrBRES+sgfzWrAfDBZHJfJ2/+HlepoQDR7YLkvfMgHYT7EKFh7pDfdktOniPW g0c88ji/JwoC6CG7Ad4ikFn6s6Z8V1T2r2pe3xGgytuR4w9pO4CTpMmX4ri0zR25t2yh Pz3pnIZQgrgYNopb1Spb241q/s5FbubiSzSVnYcwG/d7iX9EFBPWfIKwToGZLi9R9ewL aWg597jYtaY0SKVAQr9JWnMETKq7PZq3sLk6r5I5c59cZkd3m0eiKRuPaEv5ymn1T8dm pKYQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=auiEAbsU; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id c1-20020ad45ae1000000b0068cd5455df1si996298qvh.136.2024.02.15.00.17.19 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Thu, 15 Feb 2024 00:17:19 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=auiEAbsU; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1raWuC-0007mX-Bs; Thu, 15 Feb 2024 03:15:40 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1raWtZ-0007Bb-GN for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:03 -0500 Received: from mail-pg1-x52a.google.com ([2607:f8b0:4864:20::52a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1raWtX-0001O5-Or for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:01 -0500 Received: by mail-pg1-x52a.google.com with SMTP id 41be03b00d2f7-5d3912c9a83so497968a12.3 for ; Thu, 15 Feb 2024 00:14:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707984898; x=1708589698; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=o1WrtT24LEWJ2UE0G4+JsaaZMny8aXncXNU55PndN9g=; b=auiEAbsUS+da5ViS/dPNXxUTP12mUuVl4Ihtgw9dOzOwPs0p6twZkjSZMJJDKwdk80 Ij/9DI5v3S7ygfdIGecRokFz120KDWZGwIfiypRZVHToXcF6E2NinWzzHNpzrtMA3ADo IYDcYfoqVMfRbpT+CWnYvVSMoZO0pJoafK4iUbYIgIgbAIv+CkHleWJ4FWROcKq53PnP F8Zu8O1QbgzcYaQRkK1AbOnDauxAeMMw/oYzbUtjYoZW5MhInR3sxgBjVlaM3EaG1Rr2 TeK/WLKTGaiCbJP/yJEI/23wYPQHleQibFqGSZqnn6VzbzKwcBWSkjDM5PiUX5tnCM/e pvVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707984898; x=1708589698; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=o1WrtT24LEWJ2UE0G4+JsaaZMny8aXncXNU55PndN9g=; b=hsTSQu3TpF5vgKXYQSCxfZZDMvejhvn4VTiHwwfwY3rq1Yp0/SDrMvMM/c2ywSZPwk tTBrdp55FaUJj7+qW5R7qjASI9csaTDiU2CpTePD5IeFaWtEDmw0bT1ybz+lMAgrWdXj 8VqYaxqg0ySrDR+L7/PSO1WSxhM5CEP4ONggiUMFdMyLwePFtNnOKOGbN/8UvChhvpp/ FfYXiVGOyJVKv01oeY7SSAdRTjBFaZ3fYt16ytNBjJ34HIKHrTMMBnOPY2BsiXm/gRpP PNbdbLxzwmquvjxz2t2TF90im2+NC5wnl/ovOi5H3hMn3mcew5oSRvEGr4LIX+Oorp77 QU7A== X-Gm-Message-State: AOJu0YwRNsAaGcgFGkdiQ+udchB7jeQeaQWEq4x4siiUyTli9Xm/SxS4 qij8O3T2n4NxLTfn5EeKSA8jahEr3O4lTVk6fdOidmrDDgtMSeDhzOgIdC76LZg2Mck542pNY5u b X-Received: by 2002:a05:6a20:d70f:b0:1a0:686b:afdd with SMTP id iz15-20020a056a20d70f00b001a0686bafddmr1262724pzb.5.1707984898453; Thu, 15 Feb 2024 00:14:58 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id qc14-20020a17090b288e00b0029900404e11sm807755pjb.27.2024.02.15.00.14.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Feb 2024 00:14:58 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v4 05/10] util/bufferiszero: Optimize SSE2 and AVX2 variants Date: Wed, 14 Feb 2024 22:14:44 -1000 Message-Id: <20240215081449.848220-6-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240215081449.848220-1-richard.henderson@linaro.org> References: <20240215081449.848220-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::52a; envelope-from=richard.henderson@linaro.org; helo=mail-pg1-x52a.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov Increase unroll factor in SIMD loops from 4x to 8x in order to move their bottlenecks from ALU port contention to load issue rate (two loads per cycle on popular x86 implementations). Avoid using out-of-bounds pointers in loop boundary conditions. Follow SSE2 implementation strategy in the AVX2 variant. Avoid use of PTEST, which is not profitable there (like in the removed SSE4 variant). Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-6-amonakov@ispras.ru> --- util/bufferiszero.c | 111 +++++++++++++++++++++++++++++--------------- 1 file changed, 73 insertions(+), 38 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 6ef5f8ec79..2822155c27 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -67,62 +67,97 @@ static bool buffer_is_zero_integer(const void *buf, size_t len) #if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) #include -/* Note that each of these vectorized functions require len >= 64. */ +/* Helper for preventing the compiler from reassociating + chains of binary vector operations. */ +#define SSE_REASSOC_BARRIER(vec0, vec1) asm("" : "+x"(vec0), "+x"(vec1)) + +/* Note that these vectorized functions may assume len >= 256. */ static bool __attribute__((target("sse2"))) buffer_zero_sse2(const void *buf, size_t len) { - __m128i t = _mm_loadu_si128(buf); - __m128i *p = (__m128i *)(((uintptr_t)buf + 5 * 16) & -16); - __m128i *e = (__m128i *)(((uintptr_t)buf + len) & -16); - __m128i zero = _mm_setzero_si128(); + /* Unaligned loads at head/tail. */ + __m128i v = *(__m128i_u *)(buf); + __m128i w = *(__m128i_u *)(buf + len - 16); + /* Align head/tail to 16-byte boundaries. */ + const __m128i *p = QEMU_ALIGN_PTR_DOWN(buf + 16, 16); + const __m128i *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 16); + __m128i zero = { 0 }; - /* Loop over 16-byte aligned blocks of 64. */ - while (likely(p <= e)) { - t = _mm_cmpeq_epi8(t, zero); - if (unlikely(_mm_movemask_epi8(t) != 0xFFFF)) { + /* Collect a partial block at tail end. */ + v |= e[-1]; w |= e[-2]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-3]; w |= e[-4]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-5]; w |= e[-6]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-7]; v |= w; + + /* + * Loop over complete 128-byte blocks. + * With the head and tail removed, e - p >= 14, so the loop + * must iterate at least once. + */ + do { + v = _mm_cmpeq_epi8(v, zero); + if (unlikely(_mm_movemask_epi8(v) != 0xFFFF)) { return false; } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } + v = p[0]; w = p[1]; + SSE_REASSOC_BARRIER(v, w); + v |= p[2]; w |= p[3]; + SSE_REASSOC_BARRIER(v, w); + v |= p[4]; w |= p[5]; + SSE_REASSOC_BARRIER(v, w); + v |= p[6]; w |= p[7]; + SSE_REASSOC_BARRIER(v, w); + v |= w; + p += 8; + } while (p < e - 7); - /* Finish the aligned tail. */ - t |= e[-3]; - t |= e[-2]; - t |= e[-1]; - - /* Finish the unaligned tail. */ - t |= _mm_loadu_si128(buf + len - 16); - - return _mm_movemask_epi8(_mm_cmpeq_epi8(t, zero)) == 0xFFFF; + return _mm_movemask_epi8(_mm_cmpeq_epi8(v, zero)) == 0xFFFF; } #ifdef CONFIG_AVX2_OPT static bool __attribute__((target("avx2"))) buffer_zero_avx2(const void *buf, size_t len) { - /* Begin with an unaligned head of 32 bytes. */ - __m256i t = _mm256_loadu_si256(buf); - __m256i *p = (__m256i *)(((uintptr_t)buf + 5 * 32) & -32); - __m256i *e = (__m256i *)(((uintptr_t)buf + len) & -32); + /* Unaligned loads at head/tail. */ + __m256i v = *(__m256i_u *)(buf); + __m256i w = *(__m256i_u *)(buf + len - 32); + /* Align head/tail to 32-byte boundaries. */ + const __m256i *p = QEMU_ALIGN_PTR_DOWN(buf + 32, 32); + const __m256i *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 32); + __m256i zero = { 0 }; - /* Loop over 32-byte aligned blocks of 128. */ - while (p <= e) { - if (unlikely(!_mm256_testz_si256(t, t))) { + /* Collect a partial block at tail end. */ + v |= e[-1]; w |= e[-2]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-3]; w |= e[-4]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-5]; w |= e[-6]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-7]; v |= w; + + /* Loop over complete 256-byte blocks. */ + for (; p < e - 7; p += 8) { + /* PTEST is not profitable here. */ + v = _mm256_cmpeq_epi8(v, zero); + if (unlikely(_mm256_movemask_epi8(v) != 0xFFFFFFFF)) { return false; } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } ; + v = p[0]; w = p[1]; + SSE_REASSOC_BARRIER(v, w); + v |= p[2]; w |= p[3]; + SSE_REASSOC_BARRIER(v, w); + v |= p[4]; w |= p[5]; + SSE_REASSOC_BARRIER(v, w); + v |= p[6]; w |= p[7]; + SSE_REASSOC_BARRIER(v, w); + v |= w; + } - /* Finish the last block of 128 unaligned. */ - t |= _mm256_loadu_si256(buf + len - 4 * 32); - t |= _mm256_loadu_si256(buf + len - 3 * 32); - t |= _mm256_loadu_si256(buf + len - 2 * 32); - t |= _mm256_loadu_si256(buf + len - 1 * 32); - - return _mm256_testz_si256(t, t); + return _mm256_movemask_epi8(_mm256_cmpeq_epi8(v, zero)) == 0xFFFFFFFF; } #endif /* CONFIG_AVX2_OPT */ From patchwork Thu Feb 15 08:14:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 772881 Delivered-To: patch@linaro.org Received: by 2002:adf:9dc2:0:b0:33b:4db1:f5b3 with SMTP id q2csp713367wre; Thu, 15 Feb 2024 00:16:57 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCUxKw9lEBWl6Df2hiGgRK2NcEwhe0RfxT7cibab2ICBQYOflMjwCPGqqdBrxx8Ry9qUJmUqt/rhNEONUNN1efsV X-Google-Smtp-Source: AGHT+IGGKJ5yDyKI9TbA2W+RsJ68nLnf1BeYxtBK0hpoQrhuWZkD06Rytsd0GIViTVZ2gpcKnHLF X-Received: by 2002:a05:620a:cc8:b0:787:1fb5:7dd5 with SMTP id b8-20020a05620a0cc800b007871fb57dd5mr1126623qkj.46.1707985017436; Thu, 15 Feb 2024 00:16:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1707985017; cv=none; d=google.com; s=arc-20160816; b=d0fRLWk1hA68UmtYDF9uvWgvQq+PLzCn2EWsFOaktbO31Q2V3AX015yPIfrs22hDJ6 wTrq8MTNVAjOaeMPi3aE/gKP+biJ5PXTykvS6NfOx6K/feDZEOiCUmRTNpwyrgWa3BRj ii5u1qo33i4HyeWwaP3NHFovvadLSzcRndrEC2z5VycZyhrfVQcQ9lYB3fBfe93ICASM d4E6tVrKmTC4i3MQfzuas11awo+G70CsC0fITk/jJyPUPz4E6Z6KFvhv2gVZcznnz5hr 1n52xSpNRKIcHvfB4h5uCbET7TQrEVrAPwG8CzNCWZjlbLHhmM4Hd4SDGiFxbdnEVGj7 C4GQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=EEYj0I5Vz7Ix2So9xIua9K5UZDr6lKDlAW8K2SObdT4=; fh=bXD9UGo7d05OYjVg0qF2wA28p2gxs496M6ftxgeDKVY=; b=Z1MpCm7e/60Qud1YFGO0xzm45mI5/MNiDm9sw7rTP504FCK9C0PSKhEgb1vx1G6Nu9 X5qY02xkZGRhH6YQ6nDHnfAYX8LYAhgC5sSwPikvtvTzX3mji9KE1nMfinjLHCxXjxCK sIeYolLimRyc9nbziY/531fNiqQzlj4NSol4Ip91lxXFzIa4cjte0V8xsNzlgrn01g1V 5KtvDLPzrXlb2O7O8xBoqau9M2/mryA95kVdEKie2ynHX+6QIzdxC2qrAkpanrzTtg3t j6hOrJ2rnIDlfpteGinCncg9xeov2TQzU2z/zGBP6NDHYmJcbSlMBviHS62YzohkPf5B LGkQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=hklfNXwg; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id l23-20020a37f517000000b007872668496dsi941370qkk.268.2024.02.15.00.16.57 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Thu, 15 Feb 2024 00:16:57 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=hklfNXwg; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1raWuA-0007jU-49; Thu, 15 Feb 2024 03:15:38 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1raWtb-0007Cd-3y for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:07 -0500 Received: from mail-oi1-x22c.google.com ([2607:f8b0:4864:20::22c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1raWtZ-0001OY-8c for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:02 -0500 Received: by mail-oi1-x22c.google.com with SMTP id 5614622812f47-3c132695f1bso481388b6e.2 for ; Thu, 15 Feb 2024 00:15:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707984900; x=1708589700; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=EEYj0I5Vz7Ix2So9xIua9K5UZDr6lKDlAW8K2SObdT4=; b=hklfNXwg1fcyK0do01I1ggFKQiV5W+zkjiAM/cdWkcEFjZ1D2vRtg7+OWYPBUcvrCu H84lYwspOEypdoqqPoF8limVeVwUIv9hpP0RHR+k7FxrGgfKgOE3cLNsyYeSvGOoBChV GwtHW8R/GiaeW4WDogl7naR+aKy+VzgFfFWbFGLRAh6M4+LqYjh/iNziY6v7wmjEUTZ1 0lCG4h2E4HrbuWjbnTZdHaCcl0OukeHv8io2BqGOJDykMnxFcHH0CtEq7wy6ADxLT31J /mQAbc5AKyWSUFQv/X117tvVz9nACNccVqyWEMpRuIrlOCZXDGb+lXmFxKSzaGKrvb6T vQzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707984900; x=1708589700; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EEYj0I5Vz7Ix2So9xIua9K5UZDr6lKDlAW8K2SObdT4=; b=U+2Wt6ghFS7zcwBOPyDMJfYgiFgL5+wPNmNTzSV5o91ITH/S8XRisNuk8t4GANUJnU eGtlknRCUsyNjLbJsS5hg0vR1dyDaNc3TJ/0FgYDTIfBlYhM2BDflqnS6kYe/aGWVTP7 UFj8Ken0CeFIMOZMr7rNvl+oHIzCreaDsgnN2oNV9Jr1Yhv1ZIToSbdl12iPwet/BSFA 5PIkJ9cxLMMqkDaWvM+DHREctsjitgUsZwT/HWt20s84Acm1w/R9tqnNqkdVtQaGJyQW dpcLNF0RA75woMLGZFpfNwu4S42z6ZllqDFxrnCjus5AEZxPR4KYF1Rp7xKW0uNBccRY FHng== X-Gm-Message-State: AOJu0YzWWuEerkF/QzDPCJCPhPXcBoFypEZ+0UfWP/2CP3bI2Nh1bbJB Dh7+YtzZJY+jdAlmgic/419vcUMOmuC6et6pvEdMIyIPdrV552imjn5iAGkflgGLCSv5ClxBfre 0 X-Received: by 2002:a05:6358:885:b0:176:5d73:34ef with SMTP id m5-20020a056358088500b001765d7334efmr933588rwj.24.1707984899769; Thu, 15 Feb 2024 00:14:59 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id qc14-20020a17090b288e00b0029900404e11sm807755pjb.27.2024.02.15.00.14.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Feb 2024 00:14:59 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v4 06/10] util/bufferiszero: Improve scalar variant Date: Wed, 14 Feb 2024 22:14:45 -1000 Message-Id: <20240215081449.848220-7-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240215081449.848220-1-richard.henderson@linaro.org> References: <20240215081449.848220-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::22c; envelope-from=richard.henderson@linaro.org; helo=mail-oi1-x22c.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Split less-than and greater-than 256 cases. Use unaligned accesses for head and tail. Avoid using out-of-bounds pointers in loop boundary conditions. Signed-off-by: Richard Henderson --- util/bufferiszero.c | 86 +++++++++++++++++++++++++++------------------ 1 file changed, 52 insertions(+), 34 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 2822155c27..ce04642c67 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -28,40 +28,58 @@ static bool (*buffer_is_zero_accel)(const void *, size_t); -static bool buffer_is_zero_integer(const void *buf, size_t len) +static bool buffer_is_zero_int_lt256(const void *buf, size_t len) { - if (unlikely(len < 8)) { - /* For a very small buffer, simply accumulate all the bytes. */ - const unsigned char *p = buf; - const unsigned char *e = buf + len; - unsigned char t = 0; + uint64_t t; + const uint64_t *p, *e; - do { - t |= *p++; - } while (p < e); - - return t == 0; - } else { - /* Otherwise, use the unaligned memory access functions to - handle the beginning and end of the buffer, with a couple - of loops handling the middle aligned section. */ - uint64_t t = ldq_he_p(buf); - const uint64_t *p = (uint64_t *)(((uintptr_t)buf + 8) & -8); - const uint64_t *e = (uint64_t *)(((uintptr_t)buf + len) & -8); - - for (; p + 8 <= e; p += 8) { - if (t) { - return false; - } - t = p[0] | p[1] | p[2] | p[3] | p[4] | p[5] | p[6] | p[7]; - } - while (p < e) { - t |= *p++; - } - t |= ldq_he_p(buf + len - 8); - - return t == 0; + /* + * Use unaligned memory access functions to handle + * the beginning and end of the buffer, with a couple + * of loops handling the middle aligned section. + */ + if (unlikely(len <= 8)) { + return (ldl_he_p(buf) | ldl_he_p(buf + len - 4)) == 0; } + + t = ldq_he_p(buf) | ldq_he_p(buf + len - 8); + p = QEMU_ALIGN_PTR_DOWN(buf + 8, 8); + e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 8); + + while (p < e) { + t |= *p++; + } + return t == 0; +} + +static bool buffer_is_zero_int_ge256(const void *buf, size_t len) +{ + /* + * Use unaligned memory access functions to handle + * the beginning and end of the buffer, with a couple + * of loops handling the middle aligned section. + */ + uint64_t t = ldq_he_p(buf) | ldq_he_p(buf + len - 8); + const uint64_t *p = QEMU_ALIGN_PTR_DOWN(buf + 8, 8); + const uint64_t *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 8); + + /* Collect a partial block at the tail end. */ + t |= e[-7] | e[-6] | e[-5] | e[-4] | e[-3] | e[-2] | e[-1]; + + /* + * Loop over 64 byte blocks. + * With the head and tail removed, e - p >= 30, + * so the loop must iterate at least 3 times. + */ + do { + if (t) { + return false; + } + t = p[0] | p[1] | p[2] | p[3] | p[4] | p[5] | p[6] | p[7]; + p += 8; + } while (p < e - 7); + + return t == 0; } #if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) @@ -173,7 +191,7 @@ select_accel_cpuinfo(unsigned info) { CPUINFO_AVX2, buffer_zero_avx2 }, #endif { CPUINFO_SSE2, buffer_zero_sse2 }, - { CPUINFO_ALWAYS, buffer_is_zero_integer }, + { CPUINFO_ALWAYS, buffer_is_zero_int_ge256 }, }; for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { @@ -211,7 +229,7 @@ bool test_buffer_is_zero_next_accel(void) return false; } -#define INIT_ACCEL buffer_is_zero_integer +#define INIT_ACCEL buffer_is_zero_int_ge256 #endif static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL; @@ -237,7 +255,7 @@ bool buffer_is_zero_ool(const void *buf, size_t len) if (likely(len >= 256)) { return buffer_is_zero_accel(buf, len); } - return buffer_is_zero_integer(buf, len); + return buffer_is_zero_int_lt256(buf, len); } bool buffer_is_zero_ge256(const void *buf, size_t len) From patchwork Thu Feb 15 08:14:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 772876 Delivered-To: patch@linaro.org Received: by 2002:adf:9dc2:0:b0:33b:4db1:f5b3 with SMTP id q2csp713142wre; Thu, 15 Feb 2024 00:16:11 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCV26g/DePTdcxjeDJAkSFJprBDhHFgzODCaXSckaF0TuH0mE8GPtzHhMsRsu52lWZFlpboeaakLTc+vgjHbFazd X-Google-Smtp-Source: AGHT+IEFTobfbWr+ioB8cPzNxpI4sWo16AGW+tWpaAFxE4Unhj7iGrMSG2aIgI5gx2TYHeqgPf0n X-Received: by 2002:ac8:5716:0:b0:42d:afd3:7b69 with SMTP id 22-20020ac85716000000b0042dafd37b69mr1361269qtw.63.1707984971523; Thu, 15 Feb 2024 00:16:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1707984971; cv=none; d=google.com; s=arc-20160816; b=NMTMfHpi+v9zzlat8dTgMXtdUK16Y92kQQJNUohp+QsVHlV7lNOXOKrv1e9dkC51Xu iYcGdGxsOEG8NAJwp1gvPOtEcwAKWvOnTdEc9mWNNOqsntgB4CFHusRprs/cTjdzlCDm 1pwq3CjnyGpyUG3rN9C66rnF9qAq9jxXi4orpyGab490CEIsNUHBf2dTqba5LedLoNTP fN8Me+OOxh/dz3+SvnxAaIOCcwkRjWt4NbP0icYzIe/X8Bv+UuZ2+GUonCZirMaBivBq cZ/D6ZQbUUZXhdUbBPuO5i4OcsdDd+Z804y0JvK02Y9Yo4Y15j7WYeAkweH79z8GJml3 5cWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=+5kI9i1T4g1qlGS3lxQuuu2LwuYBA3jGXjmyocvfHSc=; fh=bXD9UGo7d05OYjVg0qF2wA28p2gxs496M6ftxgeDKVY=; b=ikwFaKf/jOqN/c9qVk/BNhEP2avfY7stiJBBIZNkOu8mW9xYAyP7DDP3XfJsXqM4KD RwIvc/5dZdAOXKs7CJ3kpPO5J8n/mWBG8YwYDFnYEtLUcnIvvh2iA6W6BMdy392vPsKA wHVicV1t9mn0n6rnQMrHmnkn+CXmiONPo74V/LE1vXRQvsPeH9q7ZVBu8xfAtpkvP4U5 a/HBwAMoyp9Pj5P4C05ksXoBCR4cwkk3bX/uYF+jD5f/JQe8MR8IYi2F16kWw8P6yadt SmcBvnXdIJ5cVLjzzn1WZDMhNqIKHDcw3HbxgPgfoJr7r4364kKaBm7TE9U0jH/Y5CVM m6vQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=bNRaOfVo; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id jv9-20020a05622aa08900b0042c50fceedasi831137qtb.441.2024.02.15.00.16.11 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Thu, 15 Feb 2024 00:16:11 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=bNRaOfVo; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1raWu9-0007hR-FJ; Thu, 15 Feb 2024 03:15:37 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1raWtb-0007Cj-PI for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:06 -0500 Received: from mail-oi1-x22b.google.com ([2607:f8b0:4864:20::22b]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1raWta-0001PV-4M for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:03 -0500 Received: by mail-oi1-x22b.google.com with SMTP id 5614622812f47-3bd72353d9fso450614b6e.3 for ; Thu, 15 Feb 2024 00:15:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707984901; x=1708589701; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+5kI9i1T4g1qlGS3lxQuuu2LwuYBA3jGXjmyocvfHSc=; b=bNRaOfVoxaup6DdFdTBI/FTE4rnKVKD2x3aHfXpQhC27t5Jx2az7s5ItLAPAgIEzX8 6sJqvvCDt5CL4dy2fVO4u07hvcl46scWgIpf/C4EyuQbguJYuFTRfn/qSErW7Q8LQKao +g4M7i3lx4kwgBFu0J84mml2mGOSWuECZ4KOZCU28e1FGPfRH4KB5vvRP/9IGBE2M27L OLYMANsL3K92mW96tqQ7flgZEaa6/595TFE0h8T/r2AuzBW7lwN+0e0KrZwZ/BnVWiWi O/ecElTi+IgIVpb6SO374SWqyzPIbCbXNrj9B+paPiXwAeZaYhJKExq2CORN+vRCLECW n1ZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707984901; x=1708589701; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+5kI9i1T4g1qlGS3lxQuuu2LwuYBA3jGXjmyocvfHSc=; b=BFdBjqY3S9MFjoG3RHS/+nHQLXGFRG0ONzh6g3ITFLW4FRkjJWu9X6qn2RSorKJ1Yv Wx/H6PRwezTfWxDeRxebi0u1g6FrdRJQWVzI4ny8nVvGlL9+codVzQ3AFQa+gT1uJtks 8MPgtw3HViFH7/geq23W6AwgG+AbWH47EuYF9yDdTEDFamFK/LmYIOWWRPROWbdemdzA lv4eh2rd9/UhVVc2El7wKAMKlN/BH+6klfG+EH8U/DiRgVOGdPkTP1sQXPKq8eAWxym0 2hQW8NRzSgIpJ8mvridmUt/5yih1nPPfHYxhnEdLQhcnAmGi0VsTjGBSRWvd2FThGTX/ ehtw== X-Gm-Message-State: AOJu0YzCLFSy2M2tmZDrEBNToLX6TPICHE/AQAIWz/LkcgwVe1qJqaas 0tRGXGGmZwe7H+xCtf7y36DsZclWBsttabXW4LkVyHdAlInldPza3Wsf9pVF7w07o9+TfH1Yi2f A X-Received: by 2002:a05:6358:6f0b:b0:178:688e:fb21 with SMTP id r11-20020a0563586f0b00b00178688efb21mr1068454rwn.7.1707984901019; Thu, 15 Feb 2024 00:15:01 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id qc14-20020a17090b288e00b0029900404e11sm807755pjb.27.2024.02.15.00.14.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Feb 2024 00:15:00 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v4 07/10] util/bufferiszero: Introduce biz_accel_fn typedef Date: Wed, 14 Feb 2024 22:14:46 -1000 Message-Id: <20240215081449.848220-8-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240215081449.848220-1-richard.henderson@linaro.org> References: <20240215081449.848220-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::22b; envelope-from=richard.henderson@linaro.org; helo=mail-oi1-x22b.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Signed-off-by: Richard Henderson Reviewed-by: Philippe Mathieu-Daudé --- util/bufferiszero.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index ce04642c67..ce80713071 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -26,7 +26,8 @@ #include "qemu/bswap.h" #include "host/cpuinfo.h" -static bool (*buffer_is_zero_accel)(const void *, size_t); +typedef bool (*biz_accel_fn)(const void *, size_t); +static biz_accel_fn buffer_is_zero_accel; static bool buffer_is_zero_int_lt256(const void *buf, size_t len) { @@ -179,13 +180,15 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ + + static unsigned __attribute__((noinline)) select_accel_cpuinfo(unsigned info) { /* Array is sorted in order of algorithm preference. */ static const struct { unsigned bit; - bool (*fn)(const void *, size_t); + biz_accel_fn fn; } all[] = { #ifdef CONFIG_AVX2_OPT { CPUINFO_AVX2, buffer_zero_avx2 }, @@ -232,7 +235,7 @@ bool test_buffer_is_zero_next_accel(void) #define INIT_ACCEL buffer_is_zero_int_ge256 #endif -static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL; +static biz_accel_fn buffer_is_zero_accel = INIT_ACCEL; static inline bool buffer_is_zero_sample3(const char *buf, size_t len) { From patchwork Thu Feb 15 08:14:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 772882 Delivered-To: patch@linaro.org Received: by 2002:adf:9dc2:0:b0:33b:4db1:f5b3 with SMTP id q2csp713395wre; Thu, 15 Feb 2024 00:17:02 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCX71IoXlfdHK9H75gH57N0MPw2m5ckhJ/EnuFZDqUCAGs8EKc08rrCD8cMHPHkZRKwLmJPqY8ii8Ei5BjAOjAXp X-Google-Smtp-Source: AGHT+IGqb6kkbban/jh1Jm3ocjCdvGY2SvYUtIrJGq7+ccGrpWzTnYoIz2QCltmtbTFU4vdSZwqE X-Received: by 2002:a1f:db04:0:b0:4c0:e4d:6b2 with SMTP id s4-20020a1fdb04000000b004c00e4d06b2mr803830vkg.8.1707985022435; Thu, 15 Feb 2024 00:17:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1707985022; cv=none; d=google.com; s=arc-20160816; b=BWTQZc9v95EkE5EAqNByvJG1VlOofLpRXNFwXIk7lqKMn7fu5t8DGwXbZntrP1xlig 7hpCLehz1Et0owMvVD3vhxV4yZrQGzdILClw+/zEMopGRrSKVnFfuWaMKd0NhnjxdoEb 2X0IFolQqxFyDpdqbKPZu8vDA9728rqaDsy90dp3f1duze/FtFvcaqifT/trgRtKRGkZ mk51OpwC2i36r56LTngnvMUgc85oD/34s6+Imgnf1JqM/s4JXh7lTSMS/enM0+2HVRVS ZBJOlp2tEyjQPjkLfoy4Xau4Q/aoipWGsBTLpPz4NxTbg8BtCGTvHlVBuYyFxbENiP1s MVhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=GT2gVjPR6SgaNO/ooCzaomL/QLce0sgZc8b20aaRw0g=; fh=bXD9UGo7d05OYjVg0qF2wA28p2gxs496M6ftxgeDKVY=; b=ZP6CVkX9t33ZbqV+HQilYq6+JV9IEu8eAaF5fk24j0VNb2jrF6YJcL126bC8dVmg4V CinHWOG3sVBH9gCBSpZL1GjHmmR+Vmo3U6bF+KYrS82mXiE18zn8s8IS6aAeAxx1Nx92 SQ8NbBXfDAyMcHCPhW9vqix6fDA2Qs6TzAdpEfO5G9cp28j7t6As0uS4USBDnNQtgY/F N37ujZqIVPcG73lf6xRL3D2n1j7xplkvbOk2ml8fz6IORKw1sRUXpKFQbTweL9l/WeoI PdQ0wl88b7NjrFs4cCj2boHObG5E5SX4tdGCn3cyQ/1PvCR/nWrr/l07r7tATVB8rVTp cPhA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=H+gbamLF; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id r11-20020a0562140c4b00b0068efb5e3ba2si904184qvj.452.2024.02.15.00.17.02 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Thu, 15 Feb 2024 00:17:02 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=H+gbamLF; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1raWuA-0007kI-AF; Thu, 15 Feb 2024 03:15:38 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1raWtd-0007Cy-Dw for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:06 -0500 Received: from mail-pj1-x1032.google.com ([2607:f8b0:4864:20::1032]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1raWtb-0001QN-MG for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:05 -0500 Received: by mail-pj1-x1032.google.com with SMTP id 98e67ed59e1d1-290da27f597so466759a91.2 for ; Thu, 15 Feb 2024 00:15:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707984902; x=1708589702; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GT2gVjPR6SgaNO/ooCzaomL/QLce0sgZc8b20aaRw0g=; b=H+gbamLFlmM9NNvGzN2OwVWi3rlwwOWLdtjbkxG3rf/w2aJi26BM0tr/uNz6AWpt6U nhKXY79JVXI+li093NB6VSQngNLpJCDxQ+2V4B+4dCsKmzIxevUb8/hR9Ts2h/tF4NXU qCr6ItTWiKxCe4l5qejFJ9Dq+Iz3mmoFuSQwdMD3xRuP9ixzpE82VpLCYnsOCbNA7t2m 0Xy/p1MdS41vH6/v+sqz5mJiZ55/EVmbl0L5tujA+ZWgNcAcfhfcddUa/4Vkal1k1WrZ Ih0v+aBCdEM/erSNBHHT6vKy50lccG9FHNwlE5nkwr7Ay2TNX/M69oBEBkneHS2fEL+P 2cEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707984902; x=1708589702; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GT2gVjPR6SgaNO/ooCzaomL/QLce0sgZc8b20aaRw0g=; b=vB+CrLX21LwR9Kukz8gPN2kL0eJczYxbQessVT2ZUE4uY89yxl3pIIxcQEyvMnZU3H KPp4GcIzcFu+iKBsgDT2kycfKgzSNOj8qfXqH/y9PzWdQ29jQA7I3zX8rVNbuEPs2Ht/ m9hIi2+3Fn52YGt3Uz8heikkEkyHcasaeIZy0jtmr7jIYv2YpJzfsjh+guHt+ZJF5d0y tRLT0mc+QftGvOecp5jL2sV/nLMXUNj3JexULX2Y141waS60R/qCP+GE+lDrmQCRCUdz W4dfhkBWhKbko4wiabPRHBIvkTOXNLQkNsjGassoCVh7ba5Ft8CuAsWsQyKhplvpAlxb GZVQ== X-Gm-Message-State: AOJu0YzromjsVXG/W8EvVtMjBPOJ+pqmE0qZcTa8QR+mZyFvh1Y/8qYW omaxvgeGzegkvGGXRpzyyuY7BL3t9qLBa5JDrvo7VDMOJdmujzIU27kKt7BUuCRYqoW3HISKZNN U X-Received: by 2002:a17:90b:1642:b0:299:165:b429 with SMTP id il2-20020a17090b164200b002990165b429mr963156pjb.23.1707984902259; Thu, 15 Feb 2024 00:15:02 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id qc14-20020a17090b288e00b0029900404e11sm807755pjb.27.2024.02.15.00.15.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Feb 2024 00:15:01 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v4 08/10] util/bufferiszero: Simplify test_buffer_is_zero_next_accel Date: Wed, 14 Feb 2024 22:14:47 -1000 Message-Id: <20240215081449.848220-9-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240215081449.848220-1-richard.henderson@linaro.org> References: <20240215081449.848220-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::1032; envelope-from=richard.henderson@linaro.org; helo=mail-pj1-x1032.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Because the three alternatives are monotonic, we don't need to keep a couple of bitmasks, just identify the strongest alternative at startup. Signed-off-by: Richard Henderson Reviewed-by: Philippe Mathieu-Daudé --- util/bufferiszero.c | 56 ++++++++++++++++++--------------------------- 1 file changed, 22 insertions(+), 34 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index ce80713071..4eef6d47bc 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -180,51 +180,39 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ - - -static unsigned __attribute__((noinline)) -select_accel_cpuinfo(unsigned info) -{ - /* Array is sorted in order of algorithm preference. */ - static const struct { - unsigned bit; - biz_accel_fn fn; - } all[] = { +static biz_accel_fn const accel_table[] = { + buffer_is_zero_int_ge256, + buffer_zero_sse2, #ifdef CONFIG_AVX2_OPT - { CPUINFO_AVX2, buffer_zero_avx2 }, + buffer_zero_avx2, #endif - { CPUINFO_SSE2, buffer_zero_sse2 }, - { CPUINFO_ALWAYS, buffer_is_zero_int_ge256 }, - }; - - for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { - if (info & all[i].bit) { - buffer_is_zero_accel = all[i].fn; - return all[i].bit; - } - } - return 0; -} - -static unsigned used_accel; +}; +static unsigned accel_index; static void __attribute__((constructor)) init_accel(void) { - used_accel = select_accel_cpuinfo(cpuinfo_init()); + unsigned info = cpuinfo_init(); + unsigned index = (info & CPUINFO_SSE2 ? 1 : 0); + +#ifdef CONFIG_AVX2_OPT + if (info & CPUINFO_AVX2) { + index = 2; + } +#endif + + accel_index = index; + buffer_is_zero_accel = accel_table[index]; } #define INIT_ACCEL NULL bool test_buffer_is_zero_next_accel(void) { - /* - * Accumulate the accelerators that we've already tested, and - * remove them from the set to test this round. We'll get back - * a zero from select_accel_cpuinfo when there are no more. - */ - unsigned used = select_accel_cpuinfo(cpuinfo & ~used_accel); - used_accel |= used; - return used; + if (accel_index != 0) { + buffer_is_zero_accel = accel_table[--accel_index]; + return true; + } + return false; } #else bool test_buffer_is_zero_next_accel(void) From patchwork Thu Feb 15 08:14:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 772877 Delivered-To: patch@linaro.org Received: by 2002:adf:9dc2:0:b0:33b:4db1:f5b3 with SMTP id q2csp713264wre; Thu, 15 Feb 2024 00:16:34 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCUogFbpcsq3GH8k1BCyrWJA0eiyvM27+sKj3UYidy/v77EpzQ0nSfcyuX3cAvFIhRYl29Ve+sTLnznfOpZ4sXIE X-Google-Smtp-Source: AGHT+IHWpN3iefsw9PWqgdWIdhBNqE1CaZOiDuu+fLheElJ/PO5pedY2tZAIIAzPhrgYNUSbK2kB X-Received: by 2002:a05:620a:852:b0:787:2253:f9ef with SMTP id u18-20020a05620a085200b007872253f9efmr1014904qku.47.1707984994654; Thu, 15 Feb 2024 00:16:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1707984994; cv=none; d=google.com; s=arc-20160816; b=qVUqjdyfWaNFA8sLWupQu/qjarobiKgNCun8IV0zCxt8k9ME0pPhMiXI08y0IecEK3 FbbKLODcBce439ngGE+/xuCQcDNk6uaKa6trFcEF1lhF0vGpePcF/oaHGrdq4G+H0ndw Q4XwCjCCPJuTwdJa8UTtFnAopieoiwdY/fHUV9FRvbh9afRB+nriMMnjhPmCD+A8ckio 50UbKTzmcYi22uaLPvKC70fIXBur/9n6CAU2DfzTDA/aK11cHcqiKLjv1sBgubJsWsTl 4BYBIapeeyrhC4I9xCjzda8eGS+ukAkgQxVzH4JyiNveMdZDaazkvkz6g/VuOOBHf6EH DS0A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=LcHS8hoi1egiJUTZk4E4y/djhNgaSWqSASRO2zuVQnQ=; fh=bXD9UGo7d05OYjVg0qF2wA28p2gxs496M6ftxgeDKVY=; b=FT2puzGfZmq78uNuUQyun/MuzL3jLeJ6uCvINKtEujHWUq8+BydimwDFy6tZRek6ti eGZ62NPAjlrj/jhHaK4sRIs4tVpgzFnfEnuRiUes+NkgMVVDkmYyetsStZun0cAfVhPv 3QYOY0zGqWghjbKJtEetzmUn13vSHe8yELQR5nZczbzYgo0YweMK0SqvOmkATeMU/U54 DpnVB6Wk8CSfTdTMraXFrLpsmP46RphkueH1oh+/oPW408txFvfWmaQuilcS+CIhXQWH ymF/SXrqyS0Ll6h2k5fSYYIo1JRyAVYLjYvPw6ltHnRgLZl9VyRD1eoSLCxwTCVHFYHj PCsw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=nl58qaCR; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id qb6-20020a05620a650600b00785b33ee81fsi1051485qkn.623.2024.02.15.00.16.34 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Thu, 15 Feb 2024 00:16:34 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=nl58qaCR; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1raWu8-0007cZ-5w; Thu, 15 Feb 2024 03:15:36 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1raWte-0007DT-Rs for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:08 -0500 Received: from mail-pg1-x529.google.com ([2607:f8b0:4864:20::529]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1raWtc-0001aA-Vn for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:06 -0500 Received: by mail-pg1-x529.google.com with SMTP id 41be03b00d2f7-5d8ddbac4fbso512868a12.0 for ; Thu, 15 Feb 2024 00:15:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707984903; x=1708589703; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=LcHS8hoi1egiJUTZk4E4y/djhNgaSWqSASRO2zuVQnQ=; b=nl58qaCR9J7WIXMixV12jPnvQKM3Ubuh2eMGbMGT/qbvYSFqvSmUaW6C6/PsL92jN1 2/nOgMfihxlLLu/0XWjrn4tioz46du1BqfXhV0tfRDkoEtBjD3HJlxQFVcu9sGsHPeAJ Q+/6mbybfb0TiYQUI6tS/MuK2w6sgOxLkwHOct4OcyDHVHOsR23ZBDNk5Nzal2Hw36Jo C2IeuRCXa82RnrHmBxnArO0wQll713nBcu/4iOy3+gKDZ/lXm1KAcurQJHox3fOiSJlM xyPLjSBDuUBIDGxeCy2dcKqOePiGVL7Cjy8sDE47Zx8Rs7aM4DNnjnR+lLHynww7ukhK ZfSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707984903; x=1708589703; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LcHS8hoi1egiJUTZk4E4y/djhNgaSWqSASRO2zuVQnQ=; b=JdneTIrRGPLb/nkv/nEChhFPJbf6jBdbqt5AFpu98qvrSNq/MuVfesGM0oj3c1Yulf O78a5RXg4MS3B4Zne88CpFSZB9duONmekHtlVuVTcrMXHQ9HAaRZa243moOtckjEX8RL MVZ7hJwu2002HqHop19kJOdLOT22IRDz6e/y+dsfXRRoAAV5RoesPXQLJFQa/TOOEa9M 3w9UwYAVBBtL2agYXWDp6Y+fHRoM5lPd7nlppMrYJStpOZDrBk5TmShmFS5Qs9qk43jG LClWyK2+rT4Y37Q+KJFZUJUAcgJM7BldgrX2BlrXkW9T5pebfGV7WbhHSGeDQ5qpfDi7 sJSQ== X-Gm-Message-State: AOJu0YzueWXD0mN1QoKtG6ezbmBA8OMIWyPlbfilDn59MtQcb58yXlgk wlpPZF0En3kAlmM5IrDrP80wWGI4BuaHPSjqJ9JGq5eF9GZ442dbxZEfJ5J6Z/d4QO8BJsbWpou i X-Received: by 2002:a17:90a:c587:b0:298:c2a8:4ade with SMTP id l7-20020a17090ac58700b00298c2a84ademr953288pjt.28.1707984903549; Thu, 15 Feb 2024 00:15:03 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id qc14-20020a17090b288e00b0029900404e11sm807755pjb.27.2024.02.15.00.15.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Feb 2024 00:15:03 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v4 09/10] util/bufferiszero: Add simd acceleration for aarch64 Date: Wed, 14 Feb 2024 22:14:48 -1000 Message-Id: <20240215081449.848220-10-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240215081449.848220-1-richard.henderson@linaro.org> References: <20240215081449.848220-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::529; envelope-from=richard.henderson@linaro.org; helo=mail-pg1-x529.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely double-check with the compiler flags for __ARM_NEON and don't bother with a runtime check. Otherwise, model the loop after the x86 SSE2 function, and use VADDV to reduce the four vector comparisons. Signed-off-by: Richard Henderson --- util/bufferiszero.c | 74 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 74 insertions(+) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 4eef6d47bc..2809b09225 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -214,7 +214,81 @@ bool test_buffer_is_zero_next_accel(void) } return false; } + +#elif defined(__aarch64__) && defined(__ARM_NEON) +#include + +#define REASSOC_BARRIER(vec0, vec1) asm("" : "+w"(vec0), "+w"(vec1)) + +static bool buffer_is_zero_simd(const void *buf, size_t len) +{ + uint32x4_t t0, t1, t2, t3; + + /* Align head/tail to 16-byte boundaries. */ + const uint32x4_t *p = QEMU_ALIGN_PTR_DOWN(buf + 16, 16); + const uint32x4_t *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 16); + + /* Unaligned loads at head/tail. */ + t0 = vld1q_u32(buf) | vld1q_u32(buf + len - 16); + + /* Collect a partial block at tail end. */ + t1 = e[-7] | e[-6]; + t2 = e[-5] | e[-4]; + t3 = e[-3] | e[-2]; + t0 |= e[-1]; + REASSOC_BARRIER(t0, t1); + REASSOC_BARRIER(t2, t3); + t0 |= t1; + t2 |= t3; + REASSOC_BARRIER(t0, t2); + t0 |= t2; + + /* + * Loop over complete 128-byte blocks. + * With the head and tail removed, e - p >= 14, so the loop + * must iterate at least once. + */ + do { + /* Each comparison is [-1,0], so reduction is in [-4..0]. */ + if (unlikely(vaddvq_u32(vceqzq_u32(t0)) != -4)) { + return false; + } + + t0 = p[0] | p[1]; + t1 = p[2] | p[3]; + t2 = p[4] | p[5]; + t3 = p[6] | p[7]; + REASSOC_BARRIER(t0, t1); + REASSOC_BARRIER(t2, t3); + t0 |= t1; + t2 |= t3; + REASSOC_BARRIER(t0, t2); + t0 |= t2; + p += 8; + } while (p < e - 7); + + return vaddvq_u32(vceqzq_u32(t0)) == -4; +} + +static biz_accel_fn const accel_table[] = { + buffer_is_zero_int_ge256, + buffer_is_zero_simd, +}; + +static unsigned accel_index = 1; +#define INIT_ACCEL buffer_is_zero_simd + +bool test_buffer_is_zero_next_accel(void) +{ + if (accel_index != 0) { + buffer_is_zero_accel = accel_table[--accel_index]; + return true; + } + return false; +} + #else + bool test_buffer_is_zero_next_accel(void) { return false; From patchwork Thu Feb 15 08:14:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 772875 Delivered-To: patch@linaro.org Received: by 2002:adf:9dc2:0:b0:33b:4db1:f5b3 with SMTP id q2csp713015wre; Thu, 15 Feb 2024 00:15:44 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCVpv6gVwOV93cQ//e35Kt5dU33IT+j34CVBCkt/PYl7Y9J/EZr4XNy6WBeQ75v0gyIwIlT7xqFzhfIlY453MDdW X-Google-Smtp-Source: AGHT+IHmK0U2JTvoplfpDDmMRaygDBfZBJ9tBQs+0B3yMzEtvf2vBv2142fMVBr3FUYEzQPb/NLl X-Received: by 2002:a05:6808:309d:b0:3c1:365c:57a1 with SMTP id bl29-20020a056808309d00b003c1365c57a1mr1329105oib.6.1707984944661; Thu, 15 Feb 2024 00:15:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1707984944; cv=none; d=google.com; s=arc-20160816; b=vRNqbGgI4ak/XNkxIJmLPpSErWVexdpwYIKO0FVKANK0WnVzAb6zeYKs5DvcIfpQ2L gT8Y6cGahgyttvp6hTY6IsLZIw/QeJveVL7sF+PFnY9OhNI7vjYzp1QLQRRttPZ+qqQm AVNy2pO0AP+DpYSOXTazvyaPV+G49nA1qyLfM6dQeSbYZ0W7Un8vzMTUb3h3d4W8OLZt xm3ay8CpRbbLuJ++v5663nmlVvRvSjhtE6+2HC7WtA4bDJWGL8YfD9kO9b2FfX6goU6+ 9Bm0nVddedzOlI7qFRUCnd8S9HBBYkNqxM7nJ30ylBr1aJvRks45xb7dvCLQrGkE3b+C O+7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=rKwcsoqbJERmdkOS9kUB6SrPLVGyNwfg2Re2n5hH/Bw=; fh=bXD9UGo7d05OYjVg0qF2wA28p2gxs496M6ftxgeDKVY=; b=BtcYDwyx1l4786GS2paPWVL2dcbJM9j4NczYDp+YFTJkoIu0wHe5fdB4G61SfS9Q5Z 4fU9yH3VVDfinF06ohpUSQaJOEbt1F2iFfh1BcyTlU7+93HXacrZti8z5GvbPpUZbdcK o0YwtBCkLQqSiEO11LesmWQkLxrZamXVtZwTGbT4WNkiYom9EC9a0Ys+uBQE13RTc0q5 F7EosWyk0STrnSGGB0Z+caBoajaSzx2DQyqYc52unovQLDIyayU0K1XCKv8ciBZ70qpp 8+up3equwe+PTqY/RzpVXgZl+jiI0xNagjKNnCTlBbtKDV0tZKPS3KbToPty5O/pcsMl mTIg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=tRUrP5e6; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id u4-20020a05622a198400b0042db1bed50asi882163qtc.582.2024.02.15.00.15.44 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Thu, 15 Feb 2024 00:15:44 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=tRUrP5e6; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1raWuB-0007lG-QB; Thu, 15 Feb 2024 03:15:39 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1raWth-0007Dz-8B for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:10 -0500 Received: from mail-pg1-x52e.google.com ([2607:f8b0:4864:20::52e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1raWte-0001bl-5n for qemu-devel@nongnu.org; Thu, 15 Feb 2024 03:15:08 -0500 Received: by mail-pg1-x52e.google.com with SMTP id 41be03b00d2f7-5ce942efda5so490502a12.2 for ; Thu, 15 Feb 2024 00:15:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707984905; x=1708589705; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rKwcsoqbJERmdkOS9kUB6SrPLVGyNwfg2Re2n5hH/Bw=; b=tRUrP5e6NdYPo3/srRcWW+J33mNgrgP5pH11LQoIQ7QHX6XqK5T2Q9cMbTPR7Ozyha bJpHHXWshDE3kfxcq0PAdFl6SRlSwxs19PXqUjNocxTaES6sZQQ2Pv2FySCfyi7AouZv 6JDwcncYI2DpmGOdO8/w3U8orXdOLfGS2g63bMNX5vSgVt7vo72SfsPANrwdOAl+Cf20 M3SVJtq4V7LEbdbVLmE+48kx10CJO3H7m5ySvkC/el1+9NizhhFj8Wq02TdUvrc+LIYn jeZsbBZRdJBEjagNc56siuWU1KzhaBc1Pb4szEQ4PCgas6FR7NSlGrndkXrDx3FAC3tX Fhtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707984905; x=1708589705; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rKwcsoqbJERmdkOS9kUB6SrPLVGyNwfg2Re2n5hH/Bw=; b=QJyHzbCWZ9SnN7D2cawUk4qIinJ7cIzfdZ2jVraXf4bMLenclf3Nrawhv/xxJgCP/m bfd//hzOgR+lIBFmrIO3GKbl3U7L6XrYS12McwXvwmh4rnKq6MAVFwsmVzEgtacNdLiW l2STdQoSWjr6b9yhL95iPto4eBQnIYJoS4YnjqPAc86kSM3tONYTxQjkoqpljSIM/rhj tLsE4LZGYpZhFDBYt7eVljRPz33134u5gEdfbbzyI8NJyzOFem+YuWS/z4kxu8sKtKEs HckWPhVu3jQNNzQX6XQMVOWoUEfrl5sQluytEqERFR2MWvL4GTahEKrmxed3H9qsZa2s Q/hg== X-Gm-Message-State: AOJu0YwUkt/W635hKlA1lWQbf6dZ7pJeuZ9y1Y4U7laC3a1WZWNAYXDA lz+/VSdBf7cNgsU6d0xzp00iTb8je2P/j/uBcKVCU44fBuZQPpCrl3aKpm93XC7MDTobAJeeMVV D X-Received: by 2002:a05:6a20:20c1:b0:19e:b534:1bcb with SMTP id t1-20020a056a2020c100b0019eb5341bcbmr1028640pza.23.1707984904797; Thu, 15 Feb 2024 00:15:04 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id qc14-20020a17090b288e00b0029900404e11sm807755pjb.27.2024.02.15.00.15.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Feb 2024 00:15:04 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [RFC PATCH v4 10/10] util/bufferiszero: Add sve acceleration for aarch64 Date: Wed, 14 Feb 2024 22:14:49 -1000 Message-Id: <20240215081449.848220-11-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240215081449.848220-1-richard.henderson@linaro.org> References: <20240215081449.848220-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::52e; envelope-from=richard.henderson@linaro.org; helo=mail-pg1-x52e.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Signed-off-by: Richard Henderson --- RFC because I've not benchmarked this on real hw, only run it through qemu for validation. --- host/include/aarch64/host/cpuinfo.h | 1 + util/bufferiszero.c | 49 +++++++++++++++++++++++++++++ util/cpuinfo-aarch64.c | 1 + meson.build | 13 ++++++++ 4 files changed, 64 insertions(+) diff --git a/host/include/aarch64/host/cpuinfo.h b/host/include/aarch64/host/cpuinfo.h index fe671534e4..b4b816cd07 100644 --- a/host/include/aarch64/host/cpuinfo.h +++ b/host/include/aarch64/host/cpuinfo.h @@ -12,6 +12,7 @@ #define CPUINFO_AES (1u << 3) #define CPUINFO_PMULL (1u << 4) #define CPUINFO_BTI (1u << 5) +#define CPUINFO_SVE (1u << 6) /* Initialized with a constructor. */ extern unsigned cpuinfo; diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 2809b09225..af64c9c224 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -270,13 +270,62 @@ static bool buffer_is_zero_simd(const void *buf, size_t len) return vaddvq_u32(vceqzq_u32(t0)) == -4; } +#ifdef CONFIG_SVE_OPT +#include + +#ifndef __ARM_FEATURE_SVE +__attribute__((target("+sve"))) +#endif +static bool buffer_is_zero_sve(const void *buf, size_t len) +{ + svbool_t p, t = svptrue_b8(); + size_t i, n; + + /* + * For the first vector, align to 16 -- reading 1 to 256 bytes. + * Note this routine is only called with len >= 256, which is the + * architectural maximum vector length: the first vector always fits. + */ + i = 0; + n = QEMU_ALIGN_PTR_DOWN(buf + svcntb(), 16) - buf; + p = svwhilelt_b8(i, n); + + do { + svuint8_t d = svld1_u8(p, buf + i); + + p = svcmpne_n_u8(t, d, 0); + if (unlikely(svptest_any(t, p))) { + return false; + } + i += n; + n = svcntb(); + p = svwhilelt_b8(i, len); + } while (svptest_any(t, p)); + + return true; +} +#endif /* CONFIG_SVE_OPT */ + static biz_accel_fn const accel_table[] = { buffer_is_zero_int_ge256, buffer_is_zero_simd, +#ifdef CONFIG_SVE_OPT + buffer_is_zero_sve, +#endif }; +#ifdef CONFIG_SVE_OPT +static unsigned accel_index; +static void __attribute__((constructor)) init_accel(void) +{ + accel_index = (cpuinfo & CPUINFO_SVE ? 2 : 1); + buffer_is_zero_accel = accel_table[accel_index]; +} +#define INIT_ACCEL NULL +#else static unsigned accel_index = 1; #define INIT_ACCEL buffer_is_zero_simd +#endif /* CONFIG_SVE_OPT */ bool test_buffer_is_zero_next_accel(void) { diff --git a/util/cpuinfo-aarch64.c b/util/cpuinfo-aarch64.c index 4c8a005715..a1e22ea66e 100644 --- a/util/cpuinfo-aarch64.c +++ b/util/cpuinfo-aarch64.c @@ -61,6 +61,7 @@ unsigned __attribute__((constructor)) cpuinfo_init(void) info |= (hwcap & HWCAP_USCAT ? CPUINFO_LSE2 : 0); info |= (hwcap & HWCAP_AES ? CPUINFO_AES : 0); info |= (hwcap & HWCAP_PMULL ? CPUINFO_PMULL : 0); + info |= (hwcap & HWCAP_SVE ? CPUINFO_SVE : 0); unsigned long hwcap2 = qemu_getauxval(AT_HWCAP2); info |= (hwcap2 & HWCAP2_BTI ? CPUINFO_BTI : 0); diff --git a/meson.build b/meson.build index c1dc83e4c0..89a8241bc0 100644 --- a/meson.build +++ b/meson.build @@ -2822,6 +2822,18 @@ config_host_data.set('CONFIG_ARM_AES_BUILTIN', cc.compiles(''' void foo(uint8x16_t *p) { *p = vaesmcq_u8(*p); } ''')) +config_host_data.set('CONFIG_SVE_OPT', cc.compiles(''' + #include + #ifndef __ARM_FEATURE_SVE + __attribute__((target("+sve"))) + #endif + void foo(void *p) { + svbool_t t = svptrue_b8(); + svuint8_t d = svld1_u8(t, p); + svptest_any(t, svcmpne_n_u8(t, d, 0)); + } + ''')) + have_pvrdma = get_option('pvrdma') \ .require(rdma.found(), error_message: 'PVRDMA requires OpenFabrics libraries') \ .require(cc.compiles(gnu_source_prefix + ''' @@ -4232,6 +4244,7 @@ summary_info += {'memory allocator': get_option('malloc')} summary_info += {'avx2 optimization': config_host_data.get('CONFIG_AVX2_OPT')} summary_info += {'avx512bw optimization': config_host_data.get('CONFIG_AVX512BW_OPT')} summary_info += {'avx512f optimization': config_host_data.get('CONFIG_AVX512F_OPT')} +summary_info += {'sve optimization': config_host_data.get('CONFIG_SVE_OPT')} summary_info += {'gcov': get_option('b_coverage')} summary_info += {'thread sanitizer': get_option('tsan')} summary_info += {'CFI support': get_option('cfi')}