From patchwork Sat Feb 17 00:39:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 773708 Delivered-To: patch@linaro.org Received: by 2002:a5d:4943:0:b0:33b:4db1:f5b3 with SMTP id r3csp206882wrs; Fri, 16 Feb 2024 16:41:36 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCVoG4vfI1D6m95pqAU7t5kYg9buCnJYonvgMArEpdqW5tFPbKF8d+lYYIQgZYryHgXmwSlx+w71cqJ1q/T5ChUg X-Google-Smtp-Source: AGHT+IGrwuhd0F5HHeDcynom73+CQ/JuJDEEXDR5ZbDtMod8GLUpFTXLX+ZK3F0QskxCboBNH6p4 X-Received: by 2002:a05:6808:291:b0:3bd:956a:1aa3 with SMTP id z17-20020a056808029100b003bd956a1aa3mr5650188oic.57.1708130496514; Fri, 16 Feb 2024 16:41:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1708130496; cv=none; d=google.com; s=arc-20160816; b=NBKWgES+QYLFaU81rbEOqrJ20wFxu8MRNrnhjnbkOf8nmWy9qZODJxBb+LODkP8n2M 0ImB1MnbmINxHVrptGsggSlS/sA8mW58PJNgNI8GkmPA/q8hFTQw0H7adfRGcyDGYgFT zHfaELTobPptTAf515SLs8ii5beHPmoaDoXQS/c7AGjPTbneC3oXieDb44fbPQIfLGcq vrN6F2Rn7yBvZbFUFpJYzG1NB66y0LvnfFflqo1Vno3Aay+OwMO3MwAdCFr7564yBIE3 ataPGp/JvJi8Z2aLd7YFLgp7p1S67XxWZDezSfq+F8e0xJwMlmTmWYPAzwmq25kKRf6X Pfuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=r70v09Z1jYUdUmudZIEve8pIi40latYcxErAtIALfYA=; fh=bXD9UGo7d05OYjVg0qF2wA28p2gxs496M6ftxgeDKVY=; b=Jt2CypKXQthg9hDZ6XQhYRTkXcjAyzjKFdCruD8Dp2vhxUz78LmcD+XJHcYJf5iuGF hTTzD1uVPTiP/qX530PNm2cYHVfwCczJUPEj+nu4DOX65Bj8BcjgraFQZL1/y26Rn9NW RKHiw9EEuM66p2w7RqJpGJMUsXq7TicTtum1GXSXGi1/as6eQSOnbiKOyZyvP/zuVu6N hYMzhOlRtk99FLHTa+zqmKy/q/aFpc6SKLm93uTkwoMBJr+czDQmjzUZs4j/q0etmLFa kuDPKIzgEUmN/Uk+/FMaHfeQVd5aKBXt8oqifMI+wQk7lT39KZyAb1HjcEtWDYzKyQE6 CtvQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=XvDDbv4q; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id m14-20020a05620a220e00b0078734b9c63dsi1094411qkh.657.2024.02.16.16.41.36 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 16 Feb 2024 16:41:36 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=XvDDbv4q; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rb8jn-000646-UP; Fri, 16 Feb 2024 19:39:27 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rb8jm-00063l-B2 for qemu-devel@nongnu.org; Fri, 16 Feb 2024 19:39:26 -0500 Received: from mail-pl1-x630.google.com ([2607:f8b0:4864:20::630]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rb8jk-0008FB-JP for qemu-devel@nongnu.org; Fri, 16 Feb 2024 19:39:25 -0500 Received: by mail-pl1-x630.google.com with SMTP id d9443c01a7336-1d934c8f8f7so27591215ad.2 for ; Fri, 16 Feb 2024 16:39:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1708130362; x=1708735162; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=r70v09Z1jYUdUmudZIEve8pIi40latYcxErAtIALfYA=; b=XvDDbv4qQrIQj7MG9zVeMlVChMd0+JgPo9jXjdSOZCgwgMfPWGI7hjpSEsnwE1vtbf Fbs/wU38AJ8HT148xV8fNLiFdS4TC2SSVYP9sI4KxtPnZIfaXWlmj2XEvOXYRUHTHC9M M/r1nEryzHPiFbS6Xen+upVb7Qa4z0jiEeFkyKjFlGKRNW8nyIGtVhjem0Mcbt3DzsJp aDJj0YatcNJOJyNtyo1oj8BwQUpmrMpa+1IWGMrnIrUGkS4vXO9H3m096qC/g71ttrNI 9T+rLhzfQdF5ywtSCi9iSTaqVW0G/DvG/0wK2F97SVktgkV74H9uOrRBBGKoMQjuu78U XhSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708130362; x=1708735162; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r70v09Z1jYUdUmudZIEve8pIi40latYcxErAtIALfYA=; b=ny597UNYM3sAnQerPUNMlw/yx4TTT7/b7BwiEjrSFXgOAVOpuWahNHgTE27jk8k6hM maj62NIVW7f6beONZv99JA4MXGtsr4cWNGNuprf4dlJ3m/NqXc3v3qao0GdV3aMhEUEt iXRiH1fnBDizE9EabxYYTA8cgqjfcIkqbh8FQZw0k5eUnm0YjJwIOizdvDlPLa53a7sH Rc6tjl2kEUgqEhcvOYmRuSQvoOdbLZ/fctu8gUJReA3GF2PKQERMDliWjStaJmSS0lEc dDZGhQKxjcL4pWlb+yEH1Gm0hASk2sZDbmlS0TQ5TCf+dbZO6WCRT5C93x0VFTt9ed0R tyjg== X-Gm-Message-State: AOJu0YyFqv0tXV3xRqrPRJ1ydDM1baryAkHLejmAjvVT/7wLaQROfp35 6oPRlwyqtDmzzVM9gIkJFo2JQBIKWeUufH2j3daDTnlkUbRlvFQxgwZgDU8XC6USoZTlXWQfLtL i X-Received: by 2002:a17:902:e806:b0:1db:c6a0:d023 with SMTP id u6-20020a170902e80600b001dbc6a0d023mr368588plg.8.1708130362643; Fri, 16 Feb 2024 16:39:22 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id z6-20020a170902ee0600b001d90306bdcfsm419325plb.65.2024.02.16.16.39.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Feb 2024 16:39:22 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v5 01/10] util/bufferiszero: Remove SSE4.1 variant Date: Fri, 16 Feb 2024 14:39:09 -1000 Message-Id: <20240217003918.52229-2-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240217003918.52229-1-richard.henderson@linaro.org> References: <20240217003918.52229-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::630; envelope-from=richard.henderson@linaro.org; helo=mail-pl1-x630.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov The SSE4.1 variant is virtually identical to the SSE2 variant, except for using 'PTEST+JNZ' in place of 'PCMPEQB+PMOVMSKB+CMP+JNE' for testing if an SSE register is all zeroes. The PTEST instruction decodes to two uops, so it can be handled only by the complex decoder, and since CMP+JNE are macro-fused, both sequences decode to three uops. The uops comprising the PTEST instruction dispatch to p0 and p5 on Intel CPUs, so PCMPEQB+PMOVMSKB is comparatively more flexible from dispatch standpoint. Hence, the use of PTEST brings no benefit from throughput standpoint. Its latency is not important, since it feeds only a conditional jump, which terminates the dependency chain. I never observed PTEST variants to be faster on real hardware. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-2-amonakov@ispras.ru> --- util/bufferiszero.c | 29 ----------------------------- 1 file changed, 29 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 3e6a5dfd63..f5a3634f9a 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -100,34 +100,6 @@ buffer_zero_sse2(const void *buf, size_t len) } #ifdef CONFIG_AVX2_OPT -static bool __attribute__((target("sse4"))) -buffer_zero_sse4(const void *buf, size_t len) -{ - __m128i t = _mm_loadu_si128(buf); - __m128i *p = (__m128i *)(((uintptr_t)buf + 5 * 16) & -16); - __m128i *e = (__m128i *)(((uintptr_t)buf + len) & -16); - - /* Loop over 16-byte aligned blocks of 64. */ - while (likely(p <= e)) { - __builtin_prefetch(p); - if (unlikely(!_mm_testz_si128(t, t))) { - return false; - } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } - - /* Finish the aligned tail. */ - t |= e[-3]; - t |= e[-2]; - t |= e[-1]; - - /* Finish the unaligned tail. */ - t |= _mm_loadu_si128(buf + len - 16); - - return _mm_testz_si128(t, t); -} - static bool __attribute__((target("avx2"))) buffer_zero_avx2(const void *buf, size_t len) { @@ -221,7 +193,6 @@ select_accel_cpuinfo(unsigned info) #endif #ifdef CONFIG_AVX2_OPT { CPUINFO_AVX2, 128, buffer_zero_avx2 }, - { CPUINFO_SSE4, 64, buffer_zero_sse4 }, #endif { CPUINFO_SSE2, 64, buffer_zero_sse2 }, { CPUINFO_ALWAYS, 0, buffer_zero_int }, From patchwork Sat Feb 17 00:39:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 773704 Delivered-To: patch@linaro.org Received: by 2002:a5d:4943:0:b0:33b:4db1:f5b3 with SMTP id r3csp206715wrs; Fri, 16 Feb 2024 16:41:09 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCUtXx93N4S7PxRfXOjuJYXzTb1ETlItsRBHn0H7UdKw0K40vUS4mAY8xLnEa6qPhrERP0yuXyeMHXRCRe1Y9afH X-Google-Smtp-Source: AGHT+IG8HKyGsCadDeM/d9aKM9D/eHYz4dG9zdP9eQeqGfVllHQWgrWS1VzSvuZ1ncVBngLqfsGQ X-Received: by 2002:a05:6808:15a7:b0:3c1:4a3e:bfd0 with SMTP id t39-20020a05680815a700b003c14a3ebfd0mr689167oiw.10.1708130468901; Fri, 16 Feb 2024 16:41:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1708130468; cv=none; d=google.com; s=arc-20160816; b=HhLZ8H7+YTYkNdR4pUDoE/YxxpfNzsKoZ1QdjEuZGXGkcQ+7JM1zLI00MPQIoR8xEG JoUQul/ibH3gADVfY07ERZC3pOWaFYdNx7Ya+pD/M7jKFlDgyW38F2vZuOQr/11j0SsH IfAJaYaavqyzd+XDt+dbXTI4oyV29+u/eb/tQqjqhOJoBtTJCQHAXKsaXq/8UpZu5Ipu BOR1no+sCNNs2RNh9dkMtL2Q4CD7elb0SSQT+GQozeZKAZHPtohNws6DmWNQMSkvP8NA q7Zg+BSTHydfwLTJ6gERInejV5qZlI+fSgK1mH+P0jShaQe8O2ECNMPSd54GWZH9LVt2 Lc/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=Sc6Au+pExfgVbkEVt3xBcrgjk6DJCZWLxXzjjgNJTcI=; fh=bXD9UGo7d05OYjVg0qF2wA28p2gxs496M6ftxgeDKVY=; b=kgIpiJGMfljc/1sWPPyNC/Bfi4PNK8XzAJF2CX9TRViSDIfyRn8OqOfMfRh4H7+vqk vo2+tVE8dCEpE/3Fjf+pgGylPquIf76Bx9e44BacBNDiIFtr9s4pLH4JPPuMP7yOuLiq F8/UAxtF/j5QzPXwy0hm96tkCt6t024LMXON8dUUEl+0gCbPlp/oGwA4sxPSK98vKhyo IkH5IZ9NyhLHvVDbyqGeaq8MpN/roEyAL8EfruvZiIZJUxGlON8KiUXhF5Lk9iCnr3ow 329JuHrvpcoyCqk1BcEy9XdWqZhBMUyWdljfFiahEfuoTrlp74QFk/mHYvazZhdKXguH fBog==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=PsmPtOjw; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id t2-20020a05621421a200b0068ccc9a313csi917285qvc.606.2024.02.16.16.41.08 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 16 Feb 2024 16:41:08 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=PsmPtOjw; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rb8jp-00064g-6t; Fri, 16 Feb 2024 19:39:29 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rb8jn-00063x-Ld for qemu-devel@nongnu.org; Fri, 16 Feb 2024 19:39:27 -0500 Received: from mail-pl1-x62d.google.com ([2607:f8b0:4864:20::62d]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rb8jl-0008FM-E0 for qemu-devel@nongnu.org; Fri, 16 Feb 2024 19:39:27 -0500 Received: by mail-pl1-x62d.google.com with SMTP id d9443c01a7336-1d953fa3286so11474115ad.2 for ; Fri, 16 Feb 2024 16:39:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1708130364; x=1708735164; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Sc6Au+pExfgVbkEVt3xBcrgjk6DJCZWLxXzjjgNJTcI=; b=PsmPtOjwIOUa0GBrFYzxy0RmwnLHQaK2UN5QIV+4QaMnBXDFOaDouihhHYDvn7UOBz 8plMNwvieI+/46Qlpa/Fe3Ag6drshE94ewgH1ofo9rCLiEQpb3N7j/2M7+xQr05711UA jSCSouLYdwviT2LJ+DdL7WLJnF8vwkb6C/+dhLWZYjIRpizrLJpmj/+jEgw6oCvKPpcq O4pNb1grhHkgknhyO/UEb9lqCCe6WX0/cTOrhssC4FdgR/biIXDg0YJo100NVY+0W3IY K+P5usF6ROlgUiEoLLUdUCEXbk2a5yX9++rvxwLYaQ3QVrB+LupMvIVBf8+9VQmj1uyM +alA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708130364; x=1708735164; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Sc6Au+pExfgVbkEVt3xBcrgjk6DJCZWLxXzjjgNJTcI=; b=bXUUKA4duaW/mJQU9j8kjX3JUy9zqTI/lwhAYX73CkiwvGOjHmXXA6su5e2ftDBoKf BIqvViCv69RD42ahSWBhAOGtdTXKhWMIdQ4+tS09o45jWN6aNBGH0cqUZseQzYPTXCS8 gazNcdIln4dvdj8rUGOy3ZQ2vpMx+31fNUm9KXozD4kYlepyxhajZ9bwbWxa/V+4Hx2S VIXXNu6qi860Pli3RsFFijgfAGnhGPFI8VumP2voMTPrMlrpdMQDSkP5l3rJUTBS1v0c TZZvJ5GaqTEt3xT090itXjs74bBSqWfow31TcGABNhvlvAmNp7sLGupvH+kMf/rImVaU IqJg== X-Gm-Message-State: AOJu0YwvGTOuQn6DBpbGqIr+rhrR8x2ltfd79h5eYGnJGu8arwuOC1gH ys89fkenr3U8LT8fMHaqIXKDoMJDXvnLRW5HTQz4hiiSbxRySRq+ZYr5u0NZyilrdV/AgQqSgXT U X-Received: by 2002:a17:902:c944:b0:1db:c1a3:7d59 with SMTP id i4-20020a170902c94400b001dbc1a37d59mr1228054pla.9.1708130364123; Fri, 16 Feb 2024 16:39:24 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id z6-20020a170902ee0600b001d90306bdcfsm419325plb.65.2024.02.16.16.39.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Feb 2024 16:39:23 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v5 02/10] util/bufferiszero: Remove AVX512 variant Date: Fri, 16 Feb 2024 14:39:10 -1000 Message-Id: <20240217003918.52229-3-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240217003918.52229-1-richard.henderson@linaro.org> References: <20240217003918.52229-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::62d; envelope-from=richard.henderson@linaro.org; helo=mail-pl1-x62d.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov Thanks to early checks in the inline buffer_is_zero wrapper, the SIMD routines are invoked much more rarely in normal use when most buffers are non-zero. This makes use of AVX512 unprofitable, as it incurs extra frequency and voltage transition periods during which the CPU operates at reduced performance, as described in https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html Signed-off-by: Mikhail Romanov Signed-off-by: Alexander Monakov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-4-amonakov@ispras.ru> Signed-off-by: Richard Henderson --- util/bufferiszero.c | 38 +++----------------------------------- 1 file changed, 3 insertions(+), 35 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index f5a3634f9a..641d5f9b9e 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -64,7 +64,7 @@ buffer_zero_int(const void *buf, size_t len) } } -#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) || defined(__SSE2__) +#if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) #include /* Note that each of these vectorized functions require len >= 64. */ @@ -128,41 +128,12 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ -#ifdef CONFIG_AVX512F_OPT -static bool __attribute__((target("avx512f"))) -buffer_zero_avx512(const void *buf, size_t len) -{ - /* Begin with an unaligned head of 64 bytes. */ - __m512i t = _mm512_loadu_si512(buf); - __m512i *p = (__m512i *)(((uintptr_t)buf + 5 * 64) & -64); - __m512i *e = (__m512i *)(((uintptr_t)buf + len) & -64); - - /* Loop over 64-byte aligned blocks of 256. */ - while (p <= e) { - __builtin_prefetch(p); - if (unlikely(_mm512_test_epi64_mask(t, t))) { - return false; - } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } - - t |= _mm512_loadu_si512(buf + len - 4 * 64); - t |= _mm512_loadu_si512(buf + len - 3 * 64); - t |= _mm512_loadu_si512(buf + len - 2 * 64); - t |= _mm512_loadu_si512(buf + len - 1 * 64); - - return !_mm512_test_epi64_mask(t, t); - -} -#endif /* CONFIG_AVX512F_OPT */ - /* * Make sure that these variables are appropriately initialized when * SSE2 is enabled on the compiler command-line, but the compiler is * too old to support CONFIG_AVX2_OPT. */ -#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) +#if defined(CONFIG_AVX2_OPT) # define INIT_USED 0 # define INIT_LENGTH 0 # define INIT_ACCEL buffer_zero_int @@ -188,9 +159,6 @@ select_accel_cpuinfo(unsigned info) unsigned len; bool (*fn)(const void *, size_t); } all[] = { -#ifdef CONFIG_AVX512F_OPT - { CPUINFO_AVX512F, 256, buffer_zero_avx512 }, -#endif #ifdef CONFIG_AVX2_OPT { CPUINFO_AVX2, 128, buffer_zero_avx2 }, #endif @@ -208,7 +176,7 @@ select_accel_cpuinfo(unsigned info) return 0; } -#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) +#if defined(CONFIG_AVX2_OPT) static void __attribute__((constructor)) init_accel(void) { used_accel = select_accel_cpuinfo(cpuinfo_init()); From patchwork Sat Feb 17 00:39:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 773701 Delivered-To: patch@linaro.org Received: by 2002:a5d:4943:0:b0:33b:4db1:f5b3 with SMTP id r3csp206573wrs; Fri, 16 Feb 2024 16:40:33 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCX6m0d/vob32lBNGyfm7D3ls1NHR1oub9gQiU3BATpFshinXe5NXIEam7yRdF2Lcpiw+W3SWSMX+7JRq4aJy3af X-Google-Smtp-Source: AGHT+IEnopfVCJsbCR35Giuk2lF+mbmn/iTAD/Z6Xbsj1U+QXslpflZGnHLWhaRh1yM+GCYlN3VU X-Received: by 2002:ad4:5b8b:0:b0:68f:2c23:dcd8 with SMTP id 11-20020ad45b8b000000b0068f2c23dcd8mr8753597qvp.29.1708130433226; Fri, 16 Feb 2024 16:40:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1708130433; cv=none; d=google.com; s=arc-20160816; b=GMRk9gznechEhTc92tu2eJsfMTbEjyd02rMszgut0LJByuvv0oAnjr4PxBOsayCS2F sa8ZAjMQpHx5GMJcPJx+RbXMJqfRDYh4B4VZdh/ATJqon6DbNbX/lcJV+QyyP2wicXRS iDpWydJvLnGuHRUqrIN1QrIPV3+r6nPWxXlPbVSyOqWdkpD9an4rCGvUSaTRFHy3asHM xwIuRv6EBiWwChUf9NiJ7YmIU3+YDwH9twCBl5mq8iWcicUliK6Q+0qftwYowo/JzH00 lTN4ronVTfJfI0PIM/nEdG8oAxtFFYnMMp7vvSwFidLqV1kHgT08vA/NSN74vj8Pakdt lzHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=Aco89HGhDUqxtzZSl7gQEXCIzX7CLhr3rgmAxnnD0GI=; fh=bXD9UGo7d05OYjVg0qF2wA28p2gxs496M6ftxgeDKVY=; b=tPAk+hrFCZ7I54pjJraxLAU4XHEDW0HhS/TFBMU1LRERSLhhgS8AP7KC+VOjfj5ver ryVHcs2gjifiVnq+Pi0bgZTxj5ViQibDkmVaUFzTHeXCyoDccnDo3vTlCOinxrhK5S3O 2h2U/RBweDb+OO6fGo2uz95xJDlxoaRaY8b1E41+VURrtqZGAncv7P0HTfU0UJYtMDpl Hf3KoGy/LpWzeHDoXecYIUmWHY9lGqE5tuttqH7VqnkNLL1X/VKggLSZkj/OTY6+Mj+g kwZ9PalqkcQhx3Sg7QhPob9jlGtTpSWUGIbaNFdDBoZVzP8j3Z3dNzDhsgulk3wWkf9B e7kw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=AcvJhLPC; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id a2-20020a056214062200b0068cb5e946cesi1007010qvx.506.2024.02.16.16.40.32 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 16 Feb 2024 16:40:33 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=AcvJhLPC; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rb8jq-00065p-Th; Fri, 16 Feb 2024 19:39:30 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rb8jo-00064f-U9 for qemu-devel@nongnu.org; Fri, 16 Feb 2024 19:39:29 -0500 Received: from mail-pj1-x1030.google.com ([2607:f8b0:4864:20::1030]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rb8jm-0008Fc-Rv for qemu-devel@nongnu.org; Fri, 16 Feb 2024 19:39:28 -0500 Received: by mail-pj1-x1030.google.com with SMTP id 98e67ed59e1d1-2994fb5ad60so266805a91.0 for ; Fri, 16 Feb 2024 16:39:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1708130365; x=1708735165; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Aco89HGhDUqxtzZSl7gQEXCIzX7CLhr3rgmAxnnD0GI=; b=AcvJhLPConnLxxcYzgup4DvQtc/+R86rkd1bEApJFp9mAQ4hK/JQBexKd2x3CKM6tk R9ubGZLzJbpkiEoxbQCaVSBbvaa4vB8DmjRtbvTvATBR7T6Y+1wCFNmI4/8PFOTeNJ/z OhXJmanESzNH86emns6WglE1cwRNgdZvjfhblH8Wc4zWNMa8L8ki2NUsO2++TQG4M/KC YPXeccpZ4k7Oknb22w923AT9bu9l1ghN4iTlPn4W+HmCZGJM3wWnXA/L4PUH7ptyxLl3 bCYsvs2CP3/8TigjzOZHm3kOeMUPfXPUMFHgOkGvbSLuN3pEjru3STZ2kAvCVP3xlQK0 hdgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708130365; x=1708735165; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Aco89HGhDUqxtzZSl7gQEXCIzX7CLhr3rgmAxnnD0GI=; b=XDNzbrK8byzNh3KxPsw4TbM8yPFGWKhbWM/QKdybRMgiQ77tSadgkAm5cRL5a8XWvq 1FeFzRyR41/7Tq2m30BIvFUkMXAvZKnymYn7ujHgI93SKzwsjpfW4rGVlsbA268VdQEV 1+zx6HgKtuewSuvqPMVcXpZXyfJaScogLRm+rlbacuexODPenYGn4QWFl+QGhXAKPIiE HKr8YegneiWjOCupwS/IFIcEoHzb6Kac8XKtoyGR2c0OGJRJ+ut783gAt60LE3Gc4vQ7 SGZEuenxq1gvVh15UaECDqJc8FGdawUHV2unK8OPhKc6tKHanboe9qV1pRKyPZrdPgsR G3QQ== X-Gm-Message-State: AOJu0YyVaokqyxlKI5DNz/vCrOCBJKWxk0ukhFrL4P6ZBuTJvXNPbPlS 4trFU1QhOD19xNbqbijtCpfdPvWaxHEunwpTs7Sy5IRo0GHrGgzuWW4IlRL0qcb1TWJBUv9iEHf 7 X-Received: by 2002:a17:90b:617:b0:298:b733:d9ad with SMTP id gb23-20020a17090b061700b00298b733d9admr13596577pjb.17.1708130365440; Fri, 16 Feb 2024 16:39:25 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id z6-20020a170902ee0600b001d90306bdcfsm419325plb.65.2024.02.16.16.39.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Feb 2024 16:39:25 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v5 03/10] util/bufferiszero: Reorganize for early test for acceleration Date: Fri, 16 Feb 2024 14:39:11 -1000 Message-Id: <20240217003918.52229-4-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240217003918.52229-1-richard.henderson@linaro.org> References: <20240217003918.52229-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::1030; envelope-from=richard.henderson@linaro.org; helo=mail-pj1-x1030.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov Test for length >= 256 inline, where is is often a constant. Before calling into the accelerated routine, sample three bytes from the buffer, which handles most non-zero buffers. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Message-Id: <20240206204809.9859-3-amonakov@ispras.ru> [rth: Use __builtin_constant_p; move the indirect call out of line.] Signed-off-by: Richard Henderson --- include/qemu/cutils.h | 32 ++++++++++++++++- util/bufferiszero.c | 84 +++++++++++++++++-------------------------- 2 files changed, 63 insertions(+), 53 deletions(-) diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h index 92c927a6a3..741dade7cf 100644 --- a/include/qemu/cutils.h +++ b/include/qemu/cutils.h @@ -187,9 +187,39 @@ char *freq_to_str(uint64_t freq_hz); /* used to print char* safely */ #define STR_OR_NULL(str) ((str) ? (str) : "null") -bool buffer_is_zero(const void *buf, size_t len); +/* + * Check if a buffer is all zeroes. + */ + +bool buffer_is_zero_ool(const void *vbuf, size_t len); +bool buffer_is_zero_ge256(const void *vbuf, size_t len); bool test_buffer_is_zero_next_accel(void); +static inline bool buffer_is_zero_sample3(const char *buf, size_t len) +{ + /* + * For any reasonably sized buffer, these three samples come from + * three different cachelines. In qemu-img usage, we find that + * each byte eliminates more than half of all buffer testing. + * It is therefore critical to performance that the byte tests + * short-circuit, so that we do not pull in additional cache lines. + * Do not "optimize" this to !(a | b | c). + */ + return !buf[0] && !buf[len - 1] && !buf[len / 2]; +} + +#ifdef __OPTIMIZE__ +static inline bool buffer_is_zero(const void *buf, size_t len) +{ + return (__builtin_constant_p(len) && len >= 256 + ? buffer_is_zero_sample3(buf, len) && + buffer_is_zero_ge256(buf, len) + : buffer_is_zero_ool(buf, len)); +} +#else +#define buffer_is_zero buffer_is_zero_ool +#endif + /* * Implementation of ULEB128 (http://en.wikipedia.org/wiki/LEB128) * Input is limited to 14-bit numbers diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 641d5f9b9e..972f394cbd 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -26,8 +26,9 @@ #include "qemu/bswap.h" #include "host/cpuinfo.h" -static bool -buffer_zero_int(const void *buf, size_t len) +static bool (*buffer_is_zero_accel)(const void *, size_t); + +static bool buffer_is_zero_integer(const void *buf, size_t len) { if (unlikely(len < 8)) { /* For a very small buffer, simply accumulate all the bytes. */ @@ -128,60 +129,38 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ -/* - * Make sure that these variables are appropriately initialized when - * SSE2 is enabled on the compiler command-line, but the compiler is - * too old to support CONFIG_AVX2_OPT. - */ -#if defined(CONFIG_AVX2_OPT) -# define INIT_USED 0 -# define INIT_LENGTH 0 -# define INIT_ACCEL buffer_zero_int -#else -# ifndef __SSE2__ -# error "ISA selection confusion" -# endif -# define INIT_USED CPUINFO_SSE2 -# define INIT_LENGTH 64 -# define INIT_ACCEL buffer_zero_sse2 -#endif - -static unsigned used_accel = INIT_USED; -static unsigned length_to_accel = INIT_LENGTH; -static bool (*buffer_accel)(const void *, size_t) = INIT_ACCEL; - static unsigned __attribute__((noinline)) select_accel_cpuinfo(unsigned info) { /* Array is sorted in order of algorithm preference. */ static const struct { unsigned bit; - unsigned len; bool (*fn)(const void *, size_t); } all[] = { #ifdef CONFIG_AVX2_OPT - { CPUINFO_AVX2, 128, buffer_zero_avx2 }, + { CPUINFO_AVX2, buffer_zero_avx2 }, #endif - { CPUINFO_SSE2, 64, buffer_zero_sse2 }, - { CPUINFO_ALWAYS, 0, buffer_zero_int }, + { CPUINFO_SSE2, buffer_zero_sse2 }, + { CPUINFO_ALWAYS, buffer_is_zero_integer }, }; for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { if (info & all[i].bit) { - length_to_accel = all[i].len; - buffer_accel = all[i].fn; + buffer_is_zero_accel = all[i].fn; return all[i].bit; } } return 0; } -#if defined(CONFIG_AVX2_OPT) +static unsigned used_accel; + static void __attribute__((constructor)) init_accel(void) { used_accel = select_accel_cpuinfo(cpuinfo_init()); } -#endif /* CONFIG_AVX2_OPT */ + +#define INIT_ACCEL NULL bool test_buffer_is_zero_next_accel(void) { @@ -194,36 +173,37 @@ bool test_buffer_is_zero_next_accel(void) used_accel |= used; return used; } - -static bool select_accel_fn(const void *buf, size_t len) -{ - if (likely(len >= length_to_accel)) { - return buffer_accel(buf, len); - } - return buffer_zero_int(buf, len); -} - #else -#define select_accel_fn buffer_zero_int bool test_buffer_is_zero_next_accel(void) { return false; } + +#define INIT_ACCEL buffer_is_zero_integer #endif -/* - * Checks if a buffer is all zeroes - */ -bool buffer_is_zero(const void *buf, size_t len) +static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL; + +bool buffer_is_zero_ool(const void *buf, size_t len) { if (unlikely(len == 0)) { return true; } + if (!buffer_is_zero_sample3(buf, len)) { + return false; + } + /* All bytes are covered for any len <= 3. */ + if (unlikely(len <= 3)) { + return true; + } - /* Fetch the beginning of the buffer while we select the accelerator. */ - __builtin_prefetch(buf); - - /* Use an optimized zero check if possible. Note that this also - includes a check for an unrolled loop over 64-bit integers. */ - return select_accel_fn(buf, len); + if (likely(len >= 256)) { + return buffer_is_zero_accel(buf, len); + } + return buffer_is_zero_integer(buf, len); +} + +bool buffer_is_zero_ge256(const void *buf, size_t len) +{ + return buffer_is_zero_accel(buf, len); } From patchwork Sat Feb 17 00:39:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 773699 Delivered-To: patch@linaro.org Received: by 2002:a5d:4943:0:b0:33b:4db1:f5b3 with SMTP id r3csp206568wrs; Fri, 16 Feb 2024 16:40:32 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCU8CZjnziF9WwXpXvwM05wTDag7SRVRCsRmYdmhWg8vlFMEAkqFyVNIoLHrukGW8VMqVbSpU///KdTWg7RibAzu X-Google-Smtp-Source: AGHT+IGi1dzBMweVK181GcS7zDKraaS0+8+mYuBf+r0EdgPl321+90RhklfudeWpChvoGbQwXUm9 X-Received: by 2002:a67:fc94:0:b0:470:41e0:bd07 with SMTP id x20-20020a67fc94000000b0047041e0bd07mr335260vsp.10.1708130432753; Fri, 16 Feb 2024 16:40:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1708130432; cv=none; d=google.com; s=arc-20160816; b=F3e1iZk9Fhj4ZVCDSFR+L5hxMmut/YrpoeSeUSD3KR3GA1+ge/+dkH0wXBjoziza6i kYyQqT16SYn82fv1YRDbEeJbe9mi9xY+j4Wv7fpuzV4lH8nqiU8u4B+XARQFNCUb2kNn RL4vb3YPEVh0EMJP2ArsupEIZkX5sz3D/OZTTktjSUe0dtkoNscRmUWLV5y4Cqn26GiI cU6cUje/+/CXB+3HwwrHHXns7vRiB5DnKRjg9ftwcZqws2U1k0mUVpm884nygjorpVPC 5uqmZCak4Ls76yJ2KFmkC61JsdOLWsUUbUyelzx9Pr5R8gTPa6Rz3BIpD5Mque0/VxLf nJQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=RPMJ9Fq7Hagw2E+aglC+ITwqURcRV1FccXRLb5FLQFU=; fh=bXD9UGo7d05OYjVg0qF2wA28p2gxs496M6ftxgeDKVY=; b=kr0xxKeiBX3K75HPMsiREaMmoXx6Unw80Qq/TYj4dJ8wSWWg/6YIq1c3Lw2ppMZyNy LglNfHWoySy7fHdpA3+Ho0ajTfCfGm7L7nu0M3Ddj4h+8C3PhXEyoeS9LjkSTngQoPLb vG7fQPbCKPBLhkpQaLPjVCkdVG2E/2wH110MtoD3ESFgLC+q1vuH33iXgMo+uJ7zh+hv jQzyLHlQvPKygQBTyTsatUlglyUOutLPD4h1qPboj9DSE6iPH2yGDJPbLnaC/98pIY0Y eLg24o7zHCMc6e4vTA4v38/iAQNqZLtUISAKKEpubumN63jseIBibDQwAAP6APzphJMQ 2oNw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=s6Lt3QDf; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id d9-20020ac85d89000000b0042c260dff43si1171469qtx.637.2024.02.16.16.40.32 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 16 Feb 2024 16:40:32 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=s6Lt3QDf; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rb8js-000666-Ln; Fri, 16 Feb 2024 19:39:32 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rb8jr-00065v-Cq for qemu-devel@nongnu.org; Fri, 16 Feb 2024 19:39:31 -0500 Received: from mail-pg1-x534.google.com ([2607:f8b0:4864:20::534]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rb8jp-0008G9-Mv for qemu-devel@nongnu.org; Fri, 16 Feb 2024 19:39:31 -0500 Received: by mail-pg1-x534.google.com with SMTP id 41be03b00d2f7-5bdbe2de25fso2208161a12.3 for ; Fri, 16 Feb 2024 16:39:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1708130367; x=1708735167; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=RPMJ9Fq7Hagw2E+aglC+ITwqURcRV1FccXRLb5FLQFU=; b=s6Lt3QDf6aOIOcLl+25gn6fa4ukCa1aQIhcjxdY6QUG4BQm0ozv9YDcW/bd5pFmV20 gPCxu2OAhJonu75T5dHDZhwZ+17dvH85a1n9xou0Fn+dg0sDwWey+xf+o4Km2A4l7wjx X/bVHzmCa8MY6qeHCCcddo1SWXmX4G9UBgqOJIYuktACIT5zzUi1fiBRC6o/uqmSW7v/ Y2MizLqJf48dNC6Ok1gKpQeHF/UAB/JuxsPVWBaSUyAIEm0lsD3H1hmxHLbfjv8ygKOz cdX98190GfluMGbZaVtDQXknzrCfizSW6yTJ60GqrlqEqnyAFbkeSoqyZy3T/pjKisEF cZ8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708130367; x=1708735167; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RPMJ9Fq7Hagw2E+aglC+ITwqURcRV1FccXRLb5FLQFU=; b=Lu3YHPPNFXOXpzV+f+D6OD35CACvAnUMnoI6DvzL842Q/ckBstFXDMqu48lUkLjzFs /AJyMAbVLi88uLC+H1MW5txw4k2ErfN4tck1CqgAgpS138TWGT72PNpaUCq4SdhRck+t xWCvjWbQZp0UeuEote7nInpYpZaT3KSy52KVg/jpN8Old+BU6Py51xthpWISDveUbycq bf6FF02zzTXu1ZUDjYAYayqbq86rp5dSPBAn+CqRSPhBEOseQmFuFxoxraIg+COwmAFw RHyddqvSwpr41kdLTGW7vP6rsK0M6WN6JOalVQrzlq5mg5pOSv0o7AUhKPF/YF4WZifM 4kvQ== X-Gm-Message-State: AOJu0YzJChpeNqqyWcOcNi6R6utnkjwxO9sCAD/m0u9re459ij0Rm16S GIQ2v6Xa0z/Fr7LMbFVU+nKyN7aog6fWuPhBwVvYGSg8oeeP3SesdyXnKr6arl9NcNXv1xSj6Jo r X-Received: by 2002:a17:902:ff0f:b0:1da:22d9:e7be with SMTP id f15-20020a170902ff0f00b001da22d9e7bemr6067832plj.23.1708130366822; Fri, 16 Feb 2024 16:39:26 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id z6-20020a170902ee0600b001d90306bdcfsm419325plb.65.2024.02.16.16.39.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Feb 2024 16:39:26 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v5 04/10] util/bufferiszero: Remove useless prefetches Date: Fri, 16 Feb 2024 14:39:12 -1000 Message-Id: <20240217003918.52229-5-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240217003918.52229-1-richard.henderson@linaro.org> References: <20240217003918.52229-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::534; envelope-from=richard.henderson@linaro.org; helo=mail-pg1-x534.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov Use of prefetching in bufferiszero.c is quite questionable: - prefetches are issued just a few CPU cycles before the corresponding line would be hit by demand loads; - they are done for simple access patterns, i.e. where hardware prefetchers can perform better; - they compete for load ports in loops that should be limited by load port throughput rather than ALU throughput. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-5-amonakov@ispras.ru> --- util/bufferiszero.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 972f394cbd..00118d649e 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -50,7 +50,6 @@ static bool buffer_is_zero_integer(const void *buf, size_t len) const uint64_t *e = (uint64_t *)(((uintptr_t)buf + len) & -8); for (; p + 8 <= e; p += 8) { - __builtin_prefetch(p + 8); if (t) { return false; } @@ -80,7 +79,6 @@ buffer_zero_sse2(const void *buf, size_t len) /* Loop over 16-byte aligned blocks of 64. */ while (likely(p <= e)) { - __builtin_prefetch(p); t = _mm_cmpeq_epi8(t, zero); if (unlikely(_mm_movemask_epi8(t) != 0xFFFF)) { return false; @@ -111,7 +109,6 @@ buffer_zero_avx2(const void *buf, size_t len) /* Loop over 32-byte aligned blocks of 128. */ while (p <= e) { - __builtin_prefetch(p); if (unlikely(!_mm256_testz_si256(t, t))) { return false; } From patchwork Sat Feb 17 00:39:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 773703 Delivered-To: patch@linaro.org Received: by 2002:a5d:4943:0:b0:33b:4db1:f5b3 with SMTP id r3csp206701wrs; Fri, 16 Feb 2024 16:41:03 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCVDkwI1+MjXVzqU+jvBF+sWYgPD5LxbT07Iy19zG+KmGIDmyt+IxHgk+3T5qVhnGXhMQerFRqRxwJNidkuVTmiS X-Google-Smtp-Source: AGHT+IFTiVNRxVZW4iFA5nyiyhi+V313CExDhUTraUmeFAhc8MRIw1rUoYf/ZFZ3WQdvp5r3ms+2 X-Received: by 2002:a05:622a:11d1:b0:42c:766f:606d with SMTP id n17-20020a05622a11d100b0042c766f606dmr8259196qtk.57.1708130463186; Fri, 16 Feb 2024 16:41:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1708130463; cv=none; d=google.com; s=arc-20160816; b=HmRcH15dRDL9M2W2c8l+JFAxMYiuou9qevIWWJ02HvdCQOokj9Z+EUEJDY1WLcmDEH v6lUiGEeS5++Dll1tENIUY8plFoDYsJu1oLL2nUFBTRrwZWmqgzCTREVUN5Q0SIcu3xw QxyxKp2zO+lruXqcT2D6KlCLY9XFRWRbhi2I+hw7g0ZWjos8n+MbA8om9Cnt3OlyP4DS FoBRt9uaxzkNf1+ri5+5p5ltDuUBzVPDwXtl6AvlIQ8litrfASYVPPKMULVOqTWfRg9h L8pjnpDC6MRzfHtFVnAo184VmKft0EK9eQ7BGmUfAak6116aEZNyBzUSbq31fqxDjhwW mnEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=LJEBA8HhooZ+SwbOdLUPlZU44jMDHr0d5xq5JBRqrDA=; fh=bXD9UGo7d05OYjVg0qF2wA28p2gxs496M6ftxgeDKVY=; b=T6jccuH2jR2zhEWzcs6GfbHA774XlAzSy+VTEnMmoHJDseEAfGtzbZsUPNd6tMe6UO tZE35SuNhbmks6ZOg4NYYuRm7ndbyOklalKbMheSEkjOOsmIYWeA+SoTmq5nkb4RAw4a L3hVCgefDcAVDlCW+nYyXTyg1Ssdm6LXFtIfRgCo44I/LmCCVN5QNcToIUTzo2AfqeRF vI4EFXTcEJ5YeFE+aCLsKT2Cp4NcxVGrrsImNPv+siq7cgFO5v2O1PX6MoSyX+2XOg1r P+wI8iEa2fO04kPVIGH+hanNAbpp4oVdf6G1l0+ENPVFS5qMlzuwpetcwcztitjqSKi7 Sj/Q==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=FKsRGgfH; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id f4-20020ac87f04000000b0042c68947664si1197623qtk.187.2024.02.16.16.41.03 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 16 Feb 2024 16:41:03 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=FKsRGgfH; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rb8ju-00066c-Am; Fri, 16 Feb 2024 19:39:34 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rb8js-000667-LW for qemu-devel@nongnu.org; Fri, 16 Feb 2024 19:39:32 -0500 Received: from mail-pl1-x633.google.com ([2607:f8b0:4864:20::633]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rb8jq-0008GM-OR for qemu-devel@nongnu.org; Fri, 16 Feb 2024 19:39:32 -0500 Received: by mail-pl1-x633.google.com with SMTP id d9443c01a7336-1d95d67ff45so21733505ad.2 for ; Fri, 16 Feb 2024 16:39:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1708130368; x=1708735168; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=LJEBA8HhooZ+SwbOdLUPlZU44jMDHr0d5xq5JBRqrDA=; b=FKsRGgfHR56u6HHcfZvVdv+jSjAbUepYWQROWMJ01sdx/9Hnjvc3M5mqyjA5CGXepL F4PF9Thr6BUuYuKLau16B0o2bQxmm37rv3SeCOD0GT5QYK5D8mMkZope45xsTeiDGHDX Z+BHbFvlmBcXSbaeIuJeduyrHy+AwULMcOPcbw0U2pJEqQyZJ2wNKCVt6wzy/PaRFcVp kejpPeXZP+l8ZxkSU8b12OLi7LpHFfb/V0/WQnYStSpdyR9wIniTiGLtyRLlBHRzPR/k dvu2VIlv5vag+LmerUHbCs50R7giW3rUPhgWbHOxoMkDus1uwdebLGSpP9aXeWnmY5vG PrqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708130368; x=1708735168; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LJEBA8HhooZ+SwbOdLUPlZU44jMDHr0d5xq5JBRqrDA=; b=eBSD+oCwjYfiwMRI1mrBU6J58APXYvPSBDpQAglrrbUk5C9bzEFdhxnmB/TtptKIES 9QBfgfcmYop0hL9mwptqct8wN7yyXijYM79sPN12WCryVQo5PisT0cU0dCuBZE2VGs53 C+WlpHxD3naAaDk0cq+GgbGD6G0jdrI17FbqHfSrl/RXV6kwaPQLRnvBdqp9hXL7748v ueFz8vJZOSrUJRHCuGYP9uAMebqvR7zyH8gMQweiqUhiMC44J+A5pgciP5eim7obuZxq 6jS+mINl9N17Ab62chmLyXzY6N7fHCl4R6CgKddmAc7j9VPB2Y9U5fEUWrmdgyyRadsX wfjQ== X-Gm-Message-State: AOJu0Yy1DAx/myF1KAg9Rjl+KdGMLm/xANGA7Ucj3FSHiDidMSwM2D7R 4cY8Nh1mMYCTUkBd82bnC7kQkQDEaHb6KSgsDxHDB1NwtJaaUUi7d8n8knltXHm5xKlWgJJ8nbn f X-Received: by 2002:a17:902:bd84:b0:1d8:cc30:bb18 with SMTP id q4-20020a170902bd8400b001d8cc30bb18mr6096696pls.52.1708130368185; Fri, 16 Feb 2024 16:39:28 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id z6-20020a170902ee0600b001d90306bdcfsm419325plb.65.2024.02.16.16.39.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Feb 2024 16:39:27 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v5 05/10] util/bufferiszero: Optimize SSE2 and AVX2 variants Date: Fri, 16 Feb 2024 14:39:13 -1000 Message-Id: <20240217003918.52229-6-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240217003918.52229-1-richard.henderson@linaro.org> References: <20240217003918.52229-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::633; envelope-from=richard.henderson@linaro.org; helo=mail-pl1-x633.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov Increase unroll factor in SIMD loops from 4x to 8x in order to move their bottlenecks from ALU port contention to load issue rate (two loads per cycle on popular x86 implementations). Avoid using out-of-bounds pointers in loop boundary conditions. Follow SSE2 implementation strategy in the AVX2 variant. Avoid use of PTEST, which is not profitable there (like in the removed SSE4 variant). Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-6-amonakov@ispras.ru> --- util/bufferiszero.c | 111 +++++++++++++++++++++++++++++--------------- 1 file changed, 73 insertions(+), 38 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 00118d649e..02df82b4ff 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -67,62 +67,97 @@ static bool buffer_is_zero_integer(const void *buf, size_t len) #if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) #include -/* Note that each of these vectorized functions require len >= 64. */ +/* Helper for preventing the compiler from reassociating + chains of binary vector operations. */ +#define SSE_REASSOC_BARRIER(vec0, vec1) asm("" : "+x"(vec0), "+x"(vec1)) + +/* Note that these vectorized functions may assume len >= 256. */ static bool __attribute__((target("sse2"))) buffer_zero_sse2(const void *buf, size_t len) { - __m128i t = _mm_loadu_si128(buf); - __m128i *p = (__m128i *)(((uintptr_t)buf + 5 * 16) & -16); - __m128i *e = (__m128i *)(((uintptr_t)buf + len) & -16); - __m128i zero = _mm_setzero_si128(); + /* Unaligned loads at head/tail. */ + __m128i v = *(__m128i_u *)(buf); + __m128i w = *(__m128i_u *)(buf + len - 16); + /* Align head/tail to 16-byte boundaries. */ + const __m128i *p = QEMU_ALIGN_PTR_DOWN(buf + 16, 16); + const __m128i *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 16); + __m128i zero = { 0 }; - /* Loop over 16-byte aligned blocks of 64. */ - while (likely(p <= e)) { - t = _mm_cmpeq_epi8(t, zero); - if (unlikely(_mm_movemask_epi8(t) != 0xFFFF)) { + /* Collect a partial block at tail end. */ + v |= e[-1]; w |= e[-2]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-3]; w |= e[-4]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-5]; w |= e[-6]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-7]; v |= w; + + /* + * Loop over complete 128-byte blocks. + * With the head and tail removed, e - p >= 14, so the loop + * must iterate at least once. + */ + do { + v = _mm_cmpeq_epi8(v, zero); + if (unlikely(_mm_movemask_epi8(v) != 0xFFFF)) { return false; } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } + v = p[0]; w = p[1]; + SSE_REASSOC_BARRIER(v, w); + v |= p[2]; w |= p[3]; + SSE_REASSOC_BARRIER(v, w); + v |= p[4]; w |= p[5]; + SSE_REASSOC_BARRIER(v, w); + v |= p[6]; w |= p[7]; + SSE_REASSOC_BARRIER(v, w); + v |= w; + p += 8; + } while (p < e - 7); - /* Finish the aligned tail. */ - t |= e[-3]; - t |= e[-2]; - t |= e[-1]; - - /* Finish the unaligned tail. */ - t |= _mm_loadu_si128(buf + len - 16); - - return _mm_movemask_epi8(_mm_cmpeq_epi8(t, zero)) == 0xFFFF; + return _mm_movemask_epi8(_mm_cmpeq_epi8(v, zero)) == 0xFFFF; } #ifdef CONFIG_AVX2_OPT static bool __attribute__((target("avx2"))) buffer_zero_avx2(const void *buf, size_t len) { - /* Begin with an unaligned head of 32 bytes. */ - __m256i t = _mm256_loadu_si256(buf); - __m256i *p = (__m256i *)(((uintptr_t)buf + 5 * 32) & -32); - __m256i *e = (__m256i *)(((uintptr_t)buf + len) & -32); + /* Unaligned loads at head/tail. */ + __m256i v = *(__m256i_u *)(buf); + __m256i w = *(__m256i_u *)(buf + len - 32); + /* Align head/tail to 32-byte boundaries. */ + const __m256i *p = QEMU_ALIGN_PTR_DOWN(buf + 32, 32); + const __m256i *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 32); + __m256i zero = { 0 }; - /* Loop over 32-byte aligned blocks of 128. */ - while (p <= e) { - if (unlikely(!_mm256_testz_si256(t, t))) { + /* Collect a partial block at tail end. */ + v |= e[-1]; w |= e[-2]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-3]; w |= e[-4]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-5]; w |= e[-6]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-7]; v |= w; + + /* Loop over complete 256-byte blocks. */ + for (; p < e - 7; p += 8) { + /* PTEST is not profitable here. */ + v = _mm256_cmpeq_epi8(v, zero); + if (unlikely(_mm256_movemask_epi8(v) != 0xFFFFFFFF)) { return false; } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } ; + v = p[0]; w = p[1]; + SSE_REASSOC_BARRIER(v, w); + v |= p[2]; w |= p[3]; + SSE_REASSOC_BARRIER(v, w); + v |= p[4]; w |= p[5]; + SSE_REASSOC_BARRIER(v, w); + v |= p[6]; w |= p[7]; + SSE_REASSOC_BARRIER(v, w); + v |= w; + } - /* Finish the last block of 128 unaligned. */ - t |= _mm256_loadu_si256(buf + len - 4 * 32); - t |= _mm256_loadu_si256(buf + len - 3 * 32); - t |= _mm256_loadu_si256(buf + len - 2 * 32); - t |= _mm256_loadu_si256(buf + len - 1 * 32); - - return _mm256_testz_si256(t, t); + return _mm256_movemask_epi8(_mm256_cmpeq_epi8(v, zero)) == 0xFFFFFFFF; } #endif /* CONFIG_AVX2_OPT */ From patchwork Sat Feb 17 00:39:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 773706 Delivered-To: patch@linaro.org Received: by 2002:a5d:4943:0:b0:33b:4db1:f5b3 with SMTP id r3csp206784wrs; Fri, 16 Feb 2024 16:41:21 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCXAB4QJZYHGfgVtsa1nBWv8o61yRgmqFOWWRWC0AuiRsMPC4M8jLMbDkpSxPOQ5UuNDyq1t8UBzqzEamExHV8Y6 X-Google-Smtp-Source: AGHT+IFMj1dL63E/6BRZOAQkeWWyHhGdwgTYDFz5AjlUHzPL0e8yRpiGCjZcz+oXhhVNJWEmNolI X-Received: by 2002:ae9:c20d:0:b0:787:3aee:32dd with SMTP id j13-20020ae9c20d000000b007873aee32ddmr5529528qkg.57.1708130480835; Fri, 16 Feb 2024 16:41:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1708130480; cv=none; d=google.com; s=arc-20160816; b=vKwN4kY5qOOeRjIkDZoe2rdPNi9uyI/OfnwfPcGcpItIJUKEDWO66eGFD75gxsLO3L rsDzvhI792Q5/ezrzjl69XiTO1qekaSf4d3niEG3W6wLhNh3D8diuAUE/lOW3N7FHQJO XBeLVEz4PE8ZOHFmBHG9xo7iHhfWJk5F5c2mmKanubeULJiyyOCXYTjvfGtCPZQ4ZPns V8hMxhF5SRXAlYGfdNNwl7/XqHmB0VmMe0ly95WMrdAnmmVhYzyjjPJDJCCx4jfeupzX sX6/iM2pcEcJhDHI358byd28faqxbyRco2DrYrJ05UZQJ9p4LPFXoDMDCIYnYD6aOYYm ohgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=kPtJ2gcWQWVvl3eYL4tk0YOKmkXHenThqZDsJAXs+wo=; fh=bXD9UGo7d05OYjVg0qF2wA28p2gxs496M6ftxgeDKVY=; b=JwkB6FF+yAouTg+yFAqwTIH4ltvhNh2hK4pnWLZg6q46hWORbZMrMi5Z8n+LfK3lv7 EiVqID6+B2a01CIBeOPh4/nVYDYcirrNocDI548DDdawxWg55BN+JKam8Lg36hfAD5yE 9gR9cf+ZQltXgQPpD9mGtzrXLFDwSFUQkuuGbi3Sk9aH9j3KVIc0xBihVJ975TMJjHE3 Zf8N2ujU0LCJnpy+FDTZMxtBkHX8RqPtqrLttfEImLyzyZItG36sGOA9vlu6ZQzaFGvK qLzH3H+yqDGkjLDYMVRE2y/2EpO5Zp75qe/yzA/b1eSnNrg9fm87w1BLyIlzQsQXIYn4 yE+Q==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=w9v4enFk; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id m23-20020ae9e017000000b00784aae26822si1026323qkk.583.2024.02.16.16.41.20 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 16 Feb 2024 16:41:20 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=w9v4enFk; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rb8jv-00068D-2n; Fri, 16 Feb 2024 19:39:35 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rb8js-00066B-RI for qemu-devel@nongnu.org; Fri, 16 Feb 2024 19:39:32 -0500 Received: from mail-pl1-x62e.google.com ([2607:f8b0:4864:20::62e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rb8jr-0008Gf-5d for qemu-devel@nongnu.org; Fri, 16 Feb 2024 19:39:32 -0500 Received: by mail-pl1-x62e.google.com with SMTP id d9443c01a7336-1d780a392fdso13142115ad.3 for ; Fri, 16 Feb 2024 16:39:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1708130369; x=1708735169; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=kPtJ2gcWQWVvl3eYL4tk0YOKmkXHenThqZDsJAXs+wo=; b=w9v4enFkGQirt/QzMAfau+pLcveXFzQyKo0fgxliLLY8+uiEAfDWgtaKZZJz9DGtyN CUyqks8yqI7JYZHLVafKx7zZEvN1zRfZiZ1h3DNtwH58xo4oOl6y+a7B74em2SaZdULs XpYvsLDhyMfr9rF1F+EGX7jAeUrFRZ5U8lniS72CYdA1gYZgrgO9TBR7YbMKM0W45Gwd OiePKYghCstYwPSQX7ZNGMv4QG6PgaaijaKBHFL5dw6O+nqNxDxSMNY+fKcDWdHK6EHF Jukl1ahAZTEa+Mw9aW/HEVeRose0O6C/QhUuWp0Spb5qyzwlsTgkOCtwXSozHNznDqxm yAuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708130369; x=1708735169; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kPtJ2gcWQWVvl3eYL4tk0YOKmkXHenThqZDsJAXs+wo=; b=L4/IUPgj0kmHhv0SWXQeBReX0TdCq3uju10HPIZFABDGzfQtYBpYVe9wlaBCdh1O2m 8iLCvCWllWx5Md+R1VdtX21AA/siRDHJHOzEh5oyNq0RU7UjcAVKkbvNQzKxGUKiCvrP vQdp498s0WxxxjzOjRJGFRgvfLfB/N7TeIadtWq81u5uOs8PqDUhfAdP7a4erG1zwJz4 tNcKdDEIHgURsaCLI1dDgJOvDTDn5Pm/I7qGZseXsjc7ArG7dfMczp9KFclyedXhYq1Z AZJ++8irXfYCRmnNAxJ8DCwY4aA+ItFsKAyn1sz09K0FV1nlxe9e4lrGdjh6TlzP/Wly AjAQ== X-Gm-Message-State: AOJu0YxVayY8aR9G1xe1DHtE8vdcE3/KO2YseCmJf0llMnjPQS2X1Kk3 2luTQlF5Obc/O4JOu5acbYWhJ58oE21zfqeEGhbqgTuLPxmOAna0iPgPhHJ3Rkb2rtrkkOgpGlZ F X-Received: by 2002:a17:903:2352:b0:1d9:bf90:2f1b with SMTP id c18-20020a170903235200b001d9bf902f1bmr7600846plh.53.1708130369517; Fri, 16 Feb 2024 16:39:29 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id z6-20020a170902ee0600b001d90306bdcfsm419325plb.65.2024.02.16.16.39.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Feb 2024 16:39:29 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v5 06/10] util/bufferiszero: Improve scalar variant Date: Fri, 16 Feb 2024 14:39:14 -1000 Message-Id: <20240217003918.52229-7-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240217003918.52229-1-richard.henderson@linaro.org> References: <20240217003918.52229-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::62e; envelope-from=richard.henderson@linaro.org; helo=mail-pl1-x62e.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Split less-than and greater-than 256 cases. Use unaligned accesses for head and tail. Avoid using out-of-bounds pointers in loop boundary conditions. Signed-off-by: Richard Henderson --- util/bufferiszero.c | 86 +++++++++++++++++++++++++++------------------ 1 file changed, 52 insertions(+), 34 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 02df82b4ff..a904b747c7 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -28,40 +28,58 @@ static bool (*buffer_is_zero_accel)(const void *, size_t); -static bool buffer_is_zero_integer(const void *buf, size_t len) +static bool buffer_is_zero_int_lt256(const void *buf, size_t len) { - if (unlikely(len < 8)) { - /* For a very small buffer, simply accumulate all the bytes. */ - const unsigned char *p = buf; - const unsigned char *e = buf + len; - unsigned char t = 0; + uint64_t t; + const uint64_t *p, *e; - do { - t |= *p++; - } while (p < e); - - return t == 0; - } else { - /* Otherwise, use the unaligned memory access functions to - handle the beginning and end of the buffer, with a couple - of loops handling the middle aligned section. */ - uint64_t t = ldq_he_p(buf); - const uint64_t *p = (uint64_t *)(((uintptr_t)buf + 8) & -8); - const uint64_t *e = (uint64_t *)(((uintptr_t)buf + len) & -8); - - for (; p + 8 <= e; p += 8) { - if (t) { - return false; - } - t = p[0] | p[1] | p[2] | p[3] | p[4] | p[5] | p[6] | p[7]; - } - while (p < e) { - t |= *p++; - } - t |= ldq_he_p(buf + len - 8); - - return t == 0; + /* + * Use unaligned memory access functions to handle + * the beginning and end of the buffer, with a couple + * of loops handling the middle aligned section. + */ + if (unlikely(len <= 8)) { + return (ldl_he_p(buf) | ldl_he_p(buf + len - 4)) == 0; } + + t = ldq_he_p(buf) | ldq_he_p(buf + len - 8); + p = QEMU_ALIGN_PTR_DOWN(buf + 8, 8); + e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 8); + + while (p < e) { + t |= *p++; + } + return t == 0; +} + +static bool buffer_is_zero_int_ge256(const void *buf, size_t len) +{ + /* + * Use unaligned memory access functions to handle + * the beginning and end of the buffer, with a couple + * of loops handling the middle aligned section. + */ + uint64_t t = ldq_he_p(buf) | ldq_he_p(buf + len - 8); + const uint64_t *p = QEMU_ALIGN_PTR_DOWN(buf + 8, 8); + const uint64_t *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 8); + + /* Collect a partial block at the tail end. */ + t |= e[-7] | e[-6] | e[-5] | e[-4] | e[-3] | e[-2] | e[-1]; + + /* + * Loop over 64 byte blocks. + * With the head and tail removed, e - p >= 30, + * so the loop must iterate at least 3 times. + */ + do { + if (t) { + return false; + } + t = p[0] | p[1] | p[2] | p[3] | p[4] | p[5] | p[6] | p[7]; + p += 8; + } while (p < e - 7); + + return t == 0; } #if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) @@ -173,7 +191,7 @@ select_accel_cpuinfo(unsigned info) { CPUINFO_AVX2, buffer_zero_avx2 }, #endif { CPUINFO_SSE2, buffer_zero_sse2 }, - { CPUINFO_ALWAYS, buffer_is_zero_integer }, + { CPUINFO_ALWAYS, buffer_is_zero_int_ge256 }, }; for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { @@ -211,7 +229,7 @@ bool test_buffer_is_zero_next_accel(void) return false; } -#define INIT_ACCEL buffer_is_zero_integer +#define INIT_ACCEL buffer_is_zero_int_ge256 #endif static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL; @@ -232,7 +250,7 @@ bool buffer_is_zero_ool(const void *buf, size_t len) if (likely(len >= 256)) { return buffer_is_zero_accel(buf, len); } - return buffer_is_zero_integer(buf, len); + return buffer_is_zero_int_lt256(buf, len); } bool buffer_is_zero_ge256(const void *buf, size_t len) From patchwork Sat Feb 17 00:39:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 773700 Delivered-To: patch@linaro.org Received: by 2002:a5d:4943:0:b0:33b:4db1:f5b3 with SMTP id r3csp206572wrs; Fri, 16 Feb 2024 16:40:33 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCVfgwgroHWBzRtovCJPDMuMEhX8HbOJb/w4xqhTEm7KmVND8KCJhhPxSA5oJOUZRlyEof4dHUuLCHX2TCOVJpUd X-Google-Smtp-Source: AGHT+IF5Wd1Gmlgd+ocY8+8a8Q2qGHVKdWHwpMYz+89ftqXV4wRwz5b2TxP3SYvGGj+l9Zqhi0/6 X-Received: by 2002:a0c:cc14:0:b0:68c:e81a:6618 with SMTP id r20-20020a0ccc14000000b0068ce81a6618mr6084645qvk.10.1708130433004; Fri, 16 Feb 2024 16:40:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1708130432; cv=none; d=google.com; s=arc-20160816; b=kOl9Zs14sGSnvs58kESSV5rErnNVP3oUnkN5lsRjdrIs4f1dD1I1S+OnSuZwENHXyW Tz/CiNjjAc3NQH/9gCiKRakpnPlbQ/QXoLNYMhsgXb53knNORdCf2xJmNyyheps+4aNP 6DTdgBGdVtJYEwClM2g9+P3W9mUBtEd0TBDngnfUzUi76M7f6OWqDPR48Kjeoxdzay2n ICXAHYaWC+ULUphnMjF/zT7j3t6lPmRN3yy8Em7mEzbU9pBRNOLyB6wY9m0WHX+mq3dw pQ0+/yaLAhXJBhAhsKJVyyFszGv6cgX/Guw/TBR3uWENleN8Cd69Ae626DB6y+P2tMlV z1Uw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=ijJGORoBUm3OaIKDGyVcLAHo3HCT5x1FLS32130PX9o=; fh=bXD9UGo7d05OYjVg0qF2wA28p2gxs496M6ftxgeDKVY=; b=DcwxqhCpK9gs0Y+rzqK4BtW24GuvqiPmb0JmRV72kD9xkK15IZR4VJkhiG3hXcRGQ9 W4eI6EmeYdmY5NSX8/ebYtPVsaGjZlYjBCyg1bPD3aHfIt7oIyA/eyfISN9psMUW02L4 IphYO0ubDUAPwCNnK1owzH6G2kDlfqsuSX2uGPKRvwxpQfAciBokC5xhjL8lWkeELy01 VlFSIH95AefK4QnKtlsv5MvGU5g8vpFGfBz/qc0/pfOzs+utNPFWEyGRNAiCnNRO1q9W Gz6c7DwplG1czQ7pHueJucLHyH/hhiJpAZNZlzOUTNVqta3hrr0Uwr0cahQyyWY24q3M IgoQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=mseDhCwh; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id p6-20020a05621421e600b0068ee94fc0c0si1015998qvj.1.2024.02.16.16.40.32 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 16 Feb 2024 16:40:32 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=mseDhCwh; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rb8jw-00068R-BH; Fri, 16 Feb 2024 19:39:36 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rb8jt-00066S-O5 for qemu-devel@nongnu.org; Fri, 16 Feb 2024 19:39:33 -0500 Received: from mail-pl1-x634.google.com ([2607:f8b0:4864:20::634]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rb8js-0008Gz-4d for qemu-devel@nongnu.org; Fri, 16 Feb 2024 19:39:33 -0500 Received: by mail-pl1-x634.google.com with SMTP id d9443c01a7336-1d72f71f222so10656755ad.1 for ; Fri, 16 Feb 2024 16:39:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1708130371; x=1708735171; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ijJGORoBUm3OaIKDGyVcLAHo3HCT5x1FLS32130PX9o=; b=mseDhCwhM9L63/KctK9w/i8kVh/CYQU7pea2pQewzp2IIhgK0f2Z0NjBHOvPpVaAEM yrKydpx+Z24fV0o741+oZBPiOdY1QrCc4HgbTKV4q5BrcIBa6P54JrARznQuPMgSzvTS nVNSMJc95xV1Ro1oUKfB+nS3sMrXYaDn7W6ce146lQgDYPJ5Ukywo5unwe4OJ9SZ7t2Q 5wMI2jpXZqnkh8LdaZmv9XxnoTIBMbArqgfq0XC8pyGOjLFUBvLVvyQj28rLovkRrNBQ MwjzQGQtqWi5Rc/LADW6H2iGcgy1YnWc2V+uQb02gE8mdmfjS6fJ22TlbITysmSFYQuB JcMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708130371; x=1708735171; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ijJGORoBUm3OaIKDGyVcLAHo3HCT5x1FLS32130PX9o=; b=ZjhdQwUSPnWHT6TLk6OMtyuGfyp669g/gkW2idVWL5b1UE8yksru+fXvioVd+aQq3X xC4v5i3xN3gPDlvhsWh07Gw4YxyG0tGYfvG69I6PMGQzfm8UBtXceLcqQAJE8PmmdLNm keMg5kd8qd7ubyQTGQUUcfNsXtFCcl6gaeiu8ASDQuoPZyZrFRD0eZkXUlkz/OZB0s1Z 0LRsoSfqeS0kQDB4/WDbErMM9eisrtAPmtqTuzzGY1I5J2sDfhtc0I+M51438L5rDS2M i3DHGnVXAQqu0558Xsc8s23gX1RcunRlgt9IInSdkGxLxEmWsODejX3ldBa8DPMhYQ8O Q6jw== X-Gm-Message-State: AOJu0YwqW7uqfbLHFuiCwEezYICGy1P4b93BD+AAPDIyRA+HXbgr5nm/ 3WYn0CN8Ja3tS3eFzOxWyvJfcAfadlkB7jpgrl3dCUdGWtUYTRj6aipm6BP2MYiPTK0FNWeQQhc z X-Received: by 2002:a17:902:650c:b0:1db:875b:efb2 with SMTP id b12-20020a170902650c00b001db875befb2mr5946519plk.10.1708130370896; Fri, 16 Feb 2024 16:39:30 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id z6-20020a170902ee0600b001d90306bdcfsm419325plb.65.2024.02.16.16.39.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Feb 2024 16:39:30 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v5 07/10] util/bufferiszero: Introduce biz_accel_fn typedef Date: Fri, 16 Feb 2024 14:39:15 -1000 Message-Id: <20240217003918.52229-8-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240217003918.52229-1-richard.henderson@linaro.org> References: <20240217003918.52229-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::634; envelope-from=richard.henderson@linaro.org; helo=mail-pl1-x634.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Signed-off-by: Richard Henderson --- util/bufferiszero.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index a904b747c7..61ea59d2e0 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -26,7 +26,8 @@ #include "qemu/bswap.h" #include "host/cpuinfo.h" -static bool (*buffer_is_zero_accel)(const void *, size_t); +typedef bool (*biz_accel_fn)(const void *, size_t); +static biz_accel_fn buffer_is_zero_accel; static bool buffer_is_zero_int_lt256(const void *buf, size_t len) { @@ -179,13 +180,15 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ + + static unsigned __attribute__((noinline)) select_accel_cpuinfo(unsigned info) { /* Array is sorted in order of algorithm preference. */ static const struct { unsigned bit; - bool (*fn)(const void *, size_t); + biz_accel_fn fn; } all[] = { #ifdef CONFIG_AVX2_OPT { CPUINFO_AVX2, buffer_zero_avx2 }, @@ -232,7 +235,7 @@ bool test_buffer_is_zero_next_accel(void) #define INIT_ACCEL buffer_is_zero_int_ge256 #endif -static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL; +static biz_accel_fn buffer_is_zero_accel = INIT_ACCEL; bool buffer_is_zero_ool(const void *buf, size_t len) { From patchwork Sat Feb 17 00:39:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 773702 Delivered-To: patch@linaro.org Received: by 2002:a5d:4943:0:b0:33b:4db1:f5b3 with SMTP id r3csp206702wrs; Fri, 16 Feb 2024 16:41:03 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCX6dndkCIvpumYqWXxJ7DAMMmTf01c9szN2/0NvwU0NcKRk1r7LJQ+SUtDllTp7OTvPSpoRc8DR8NlFOWV8jJL4 X-Google-Smtp-Source: AGHT+IFZWzBEeRx2lqGJ0qi92bsRHp6YCMpeEA6E9PLKwGeUyaJt5mFj4aXgadIvOOyWgd7JP+ay X-Received: by 2002:a05:622a:316:b0:42c:710a:2778 with SMTP id q22-20020a05622a031600b0042c710a2778mr7149330qtw.60.1708130463279; Fri, 16 Feb 2024 16:41:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1708130463; cv=none; d=google.com; s=arc-20160816; b=Vnswf9jvsSQ8HGDSaDCGhN4JSnDf9eCjXZnZ8jA50t36AkeT44T1Wn4jno1/083vHF 8VHasfm4ZX0MbO9bQwXdeqXmnqElC+F6v5hMUBMVTUsXDzf2MOTUI+UIdyxM7WwDH5/9 bw7lzK9ui8U8Vru6a4ssXKYHD0hl3UN5YmFuCGu77EyhW3TAUjw1JEuZtyJMTCVfH0sW UPfRLy3Qz7g09eFmlVd2HkS1aTrlQcD35anQOXd+LXE1/JMYjGDaNjvu2tfDmM3VD65M Px4aHHYdM6fZQazIzQDZdKnw3WokHavZ1u97zpP2Jjej6y6eoRB0KuvYKxAtQmqDnoGs 0HXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=6+vMZHnTNPjSijzEPZ6pE1ZEQb70DOSnKuGgl762H28=; fh=bXD9UGo7d05OYjVg0qF2wA28p2gxs496M6ftxgeDKVY=; b=gLODUMsh+ddNAJFnvQKkCQrqgZgKykc4fc+cVopAxnyPePaGSeL9TmEm/9lswmEUQP iwQ7jCb41ltSC34Oz6Ob8GjYiE0cJ15Oo90gk2w61LEncfpKflv8O0f921FtOBSNhuT/ IX1rS7YDJ+hBsdimo+5AWtvjPFkGzfScjerryZRCV0egIc+i+tOV+jm4EUy7JvQbIqfI 6Q/irg/rZsPh/+iG8qjlNN+vn27vsgf2GLC/I67aoYlt4Ebe5hWIwr7I03CPclWF1U7p iKCKlormHEhSaCifmlgUwnk8SlbFP4n3A7KcrdcfbqAoQA52d/tRc4l0LU2mhgNHKnUX wZrw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=pTpzVQ6K; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id jr14-20020a05622a800e00b0042c44c065e6si978052qtb.733.2024.02.16.16.41.03 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 16 Feb 2024 16:41:03 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=pTpzVQ6K; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rb8jy-000694-2Z; Fri, 16 Feb 2024 19:39:38 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rb8jv-00068G-5c for qemu-devel@nongnu.org; Fri, 16 Feb 2024 19:39:35 -0500 Received: from mail-pl1-x629.google.com ([2607:f8b0:4864:20::629]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rb8jt-0008HG-HK for qemu-devel@nongnu.org; Fri, 16 Feb 2024 19:39:34 -0500 Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-1d73066880eso25546425ad.3 for ; Fri, 16 Feb 2024 16:39:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1708130372; x=1708735172; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6+vMZHnTNPjSijzEPZ6pE1ZEQb70DOSnKuGgl762H28=; b=pTpzVQ6K1hAWblQycMsIJ7wnX4N4Upz/AKqBuk72pAFbryGG89nyBR0xK1Elglt7Kx by3MbFawWOvT6AEiSkCTCp77eZSFc7PphIeMCQLYuZLPAv4URWHxc4qFtAbraXr2gs4X Zi3Q07WZuZ6ikdrYrF+NGCdYhRv67ldooWzkHhGN0OcsTGpCTZH8Wd3iQ1q9W7N7O384 iYU7RYO1LWgLAbN6iL4R28YEaoFnhabh0q3FRBhpR8+vICywFoIaxMHuFNv+Vu5H5ayt OmmbWM9jXM7G7VjMQVZKEbZHenmsElanYEU+4Zsc04SNzxd9/RLxt3OnihUqtYBvvnFj tkvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708130372; x=1708735172; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6+vMZHnTNPjSijzEPZ6pE1ZEQb70DOSnKuGgl762H28=; b=AlDjFOttL2biPLkGqgIywmzqdpZuicJEo/TuoaBgMg/gabT/mrir7USEcAiJJA/x66 BGuuwRNSouuYTwgPs5W2a9WiAPEeV40ju1wh6BT57ohA5EU0B6LjYaAFlCNcOiCpO6ud GCk+7ta646/bO730tgxpy063WNw5ao1hPjFVqVo+EGa9YfSnfC2sO8Hn3LihhgUCeIPx lqn5FAQfRy+lG7rNXh6Sq8KUfzn9GWhS3JRhePsUTYkWv4VcjKQtG9yml8bIxBhs3frr pjuPu8xrrtugwZvEI+PdEjpNhZ3BFbpHr5PuEPH3ubOiw+w43Vzk61IdcEs7oy29WEC4 RCBQ== X-Gm-Message-State: AOJu0YxIyE+VUq9MloMFKTKxoLykp8c0/+ZQRA0rehXRfYEVqipfoXKb KeVqM+trklUUY6GFLO2lbCV2paTA7fwidM7MIivd4NcmLIsGqs70mm6pnPMB1VatvS+8btzI4u9 W X-Received: by 2002:a17:903:32c5:b0:1db:c348:60d8 with SMTP id i5-20020a17090332c500b001dbc34860d8mr1153996plr.23.1708130372163; Fri, 16 Feb 2024 16:39:32 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id z6-20020a170902ee0600b001d90306bdcfsm419325plb.65.2024.02.16.16.39.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Feb 2024 16:39:31 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v5 08/10] util/bufferiszero: Simplify test_buffer_is_zero_next_accel Date: Fri, 16 Feb 2024 14:39:16 -1000 Message-Id: <20240217003918.52229-9-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240217003918.52229-1-richard.henderson@linaro.org> References: <20240217003918.52229-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::629; envelope-from=richard.henderson@linaro.org; helo=mail-pl1-x629.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Because the three alternatives are monotonic, we don't need to keep a couple of bitmasks, just identify the strongest alternative at startup. Signed-off-by: Richard Henderson --- util/bufferiszero.c | 56 ++++++++++++++++++--------------------------- 1 file changed, 22 insertions(+), 34 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 61ea59d2e0..9b338f7be5 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -180,51 +180,39 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ - - -static unsigned __attribute__((noinline)) -select_accel_cpuinfo(unsigned info) -{ - /* Array is sorted in order of algorithm preference. */ - static const struct { - unsigned bit; - biz_accel_fn fn; - } all[] = { +static biz_accel_fn const accel_table[] = { + buffer_is_zero_int_ge256, + buffer_zero_sse2, #ifdef CONFIG_AVX2_OPT - { CPUINFO_AVX2, buffer_zero_avx2 }, + buffer_zero_avx2, #endif - { CPUINFO_SSE2, buffer_zero_sse2 }, - { CPUINFO_ALWAYS, buffer_is_zero_int_ge256 }, - }; - - for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { - if (info & all[i].bit) { - buffer_is_zero_accel = all[i].fn; - return all[i].bit; - } - } - return 0; -} - -static unsigned used_accel; +}; +static unsigned accel_index; static void __attribute__((constructor)) init_accel(void) { - used_accel = select_accel_cpuinfo(cpuinfo_init()); + unsigned info = cpuinfo_init(); + unsigned index = (info & CPUINFO_SSE2 ? 1 : 0); + +#ifdef CONFIG_AVX2_OPT + if (info & CPUINFO_AVX2) { + index = 2; + } +#endif + + accel_index = index; + buffer_is_zero_accel = accel_table[index]; } #define INIT_ACCEL NULL bool test_buffer_is_zero_next_accel(void) { - /* - * Accumulate the accelerators that we've already tested, and - * remove them from the set to test this round. We'll get back - * a zero from select_accel_cpuinfo when there are no more. - */ - unsigned used = select_accel_cpuinfo(cpuinfo & ~used_accel); - used_accel |= used; - return used; + if (accel_index != 0) { + buffer_is_zero_accel = accel_table[--accel_index]; + return true; + } + return false; } #else bool test_buffer_is_zero_next_accel(void) From patchwork Sat Feb 17 00:39:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 773705 Delivered-To: patch@linaro.org Received: by 2002:a5d:4943:0:b0:33b:4db1:f5b3 with SMTP id r3csp206763wrs; Fri, 16 Feb 2024 16:41:17 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCVLmeNz0qJedZ8mZ8LenWCm2P/qCy/eZgQRkSD6dMDymwc6Y9UvwsMcQQ9xJdAQjwMQQw7xrixZzx4qXTfMYjzx X-Google-Smtp-Source: AGHT+IFlN2OqPzzyS/F8vRbMzJnT9MDP9MUB5M4b3pH7UCa4RcROUXfxlvyFrPdaJlXc6oZ4q/nt X-Received: by 2002:a81:ad1f:0:b0:607:b0d3:ebc0 with SMTP id l31-20020a81ad1f000000b00607b0d3ebc0mr6867172ywh.21.1708130477203; Fri, 16 Feb 2024 16:41:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1708130477; cv=none; d=google.com; s=arc-20160816; b=VS8NvKI/1a2HJT8Fz6XIlAKk76+12kgcLE6mhDxycgQoDSQDIjgaf7h1i9tbiZOVtA +rUui4BFH0aL4lW5bWR6ukM6CXGVyc9BI1X4xd68HOw7agM22Y80T2xreOn0DU68BKPw cletNbtrHn5Az4435c0eRw7vLEpAugdkn4lUV3tiTj/InUL4gC8diVNn2qfvr2hn0L4B XxsBvMhuBqJNfYQQ71ji404Us4s6bF7zvqorVK/Atj5JJL3Jv2nIg/ezt3Jmbcm2P2HP ZRXSfwuSz2DSpdSIXVZibM5+X5kZiTgt9O+wDlGyfm6XJIwGX7Mm3TWKlhVYOHS44VPa Ao8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=Mm2M2ZI3+sRAHkQ9RNGmfsw8c3ap5NOHL92KrHqrl90=; fh=bXD9UGo7d05OYjVg0qF2wA28p2gxs496M6ftxgeDKVY=; b=cQrmvmfVK71qfYvqy4HeGRIFrbdVOGQkPGAgbdzxUFDgHm9fpov4sQlT83OPZ4yPFi bBJqyZyy8r6lhCPSu5FicW4wPj2cnOrWMOFao2gulpx+nj7Jh9pVjO0AdAl2fbYcaMXV MbNFnStCO4alnoD3aipRGmZv5EhylGODoF5+3wbTK8DTb5WQUILW1Hyyi4aqduIrvoPR hMigdBwBpMv4fKT6GuEDVqO6ucC7YxE4eSH0AUR7mteubb421Mj4yHZhc+d+zFa8Eg6i on+qMGOFcbHP1DnGt1yNgRjduttLFjdHIE8uCcrw1Cwv2GZ35YhJnpRxMNdXy5cwgzCW r+CQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=kA48xw66; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id f18-20020a05622a105200b0042c7cdc546asi1185397qte.423.2024.02.16.16.41.17 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 16 Feb 2024 16:41:17 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=kA48xw66; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rb8k0-00069Y-37; Fri, 16 Feb 2024 19:39:40 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rb8jw-00068p-GA for qemu-devel@nongnu.org; Fri, 16 Feb 2024 19:39:36 -0500 Received: from mail-pl1-x636.google.com ([2607:f8b0:4864:20::636]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rb8ju-0008Hf-N3 for qemu-devel@nongnu.org; Fri, 16 Feb 2024 19:39:36 -0500 Received: by mail-pl1-x636.google.com with SMTP id d9443c01a7336-1d72f71f222so10656845ad.1 for ; Fri, 16 Feb 2024 16:39:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1708130373; x=1708735173; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Mm2M2ZI3+sRAHkQ9RNGmfsw8c3ap5NOHL92KrHqrl90=; b=kA48xw66PORKmXsTOCzNEriag+NXbpLyWYyRxlnGrFs7j3gQkhekaI+StgH8E6tedj SSaz3/rExDjdDybYQf5hP14UbFAuvAY6IHyzPSx3BdZhsNhbGgMwkLHARcd2EC896Kw4 gEp7EeBMFWOokN8aiuqZr5nHAE/GiF2GM85W7Y200Nwi1OESSr90ebAM4lbkcHGgHoqT FhnSt0Iy5IZcACEAEedsT64Be2gG4wnjl20/D08Yn3OhZHoywKeo7+HOiB8viSdVH69p bHpIXVzmtMAyQgwHr72nqVG/bx/UEISRNg6f2UY50eaa4SGqnmAi7WPXWrax/TC36O6Y SO+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708130373; x=1708735173; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Mm2M2ZI3+sRAHkQ9RNGmfsw8c3ap5NOHL92KrHqrl90=; b=Q4iO0nOYCmnD81FhUuobDV8Gdo7otT1OEoHkC4YWZubYlGryrH9143YF3Cg8BT238u E8h51ewDdbCGY7AjoMTpckEPb/ZZn8Gt4dWq7AY8tvloz7a6tVyo/bLfoBKypsXyaTNh vG9DJnO+1JjyZGZW+pNr/j74NFigTH7J9U+cnAJyk62oSoPdj+aVjdBeOkrqB3enzFkC YHGCGpZx2QWB3PFtVzlL+e2MZXK//TeewXB3BRWEiPmdWYkN929aHKSpe/ACsFp1IIBI HjXmBpYAi6zCXTA9fiWNX2Sb8IVS8tbrNlDbkN4CrQvWLGIGCcTGoZnIgyBNWaEKVbOS Vuhg== X-Gm-Message-State: AOJu0Yw9EYzw8XzGil1y+3VxgcV3J3TMPWpvvOjO5J2NXKs9IByKzuc+ O3Z4G+ymz4SPvCyDoYbte3CT+9RkmWuPgQuZT4birCq0jCQz/bJwfrRiUbIW/GIfq0DfnOnIpIZ V X-Received: by 2002:a17:902:784b:b0:1db:4b29:9b21 with SMTP id e11-20020a170902784b00b001db4b299b21mr6122978pln.23.1708130373394; Fri, 16 Feb 2024 16:39:33 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id z6-20020a170902ee0600b001d90306bdcfsm419325plb.65.2024.02.16.16.39.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Feb 2024 16:39:33 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v5 09/10] util/bufferiszero: Add simd acceleration for aarch64 Date: Fri, 16 Feb 2024 14:39:17 -1000 Message-Id: <20240217003918.52229-10-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240217003918.52229-1-richard.henderson@linaro.org> References: <20240217003918.52229-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::636; envelope-from=richard.henderson@linaro.org; helo=mail-pl1-x636.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely double-check with the compiler flags for __ARM_NEON and don't bother with a runtime check. Otherwise, model the loop after the x86 SSE2 function, and use VADDV to reduce the four vector comparisons. Signed-off-by: Richard Henderson --- util/bufferiszero.c | 77 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 77 insertions(+) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 9b338f7be5..77db305bb0 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -214,7 +214,84 @@ bool test_buffer_is_zero_next_accel(void) } return false; } + +#elif defined(__aarch64__) && defined(__ARM_NEON) +#include + +#define REASSOC_BARRIER(vec0, vec1) asm("" : "+w"(vec0), "+w"(vec1)) + +static bool buffer_is_zero_simd(const void *buf, size_t len) +{ + uint32x4_t t0, t1, t2, t3; + + /* Align head/tail to 16-byte boundaries. */ + const uint32x4_t *p = QEMU_ALIGN_PTR_DOWN(buf + 16, 16); + const uint32x4_t *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 16); + + /* Unaligned loads at head/tail. */ + t0 = vld1q_u32(buf) | vld1q_u32(buf + len - 16); + + /* Collect a partial block at tail end. */ + t1 = e[-7] | e[-6]; + t2 = e[-5] | e[-4]; + t3 = e[-3] | e[-2]; + t0 |= e[-1]; + REASSOC_BARRIER(t0, t1); + REASSOC_BARRIER(t2, t3); + t0 |= t1; + t2 |= t3; + REASSOC_BARRIER(t0, t2); + t0 |= t2; + + /* + * Loop over complete 128-byte blocks. + * With the head and tail removed, e - p >= 14, so the loop + * must iterate at least once. + */ + do { + /* + * Reduce via UMAXV. Whatever the actual result, + * it will only be zero if all input bytes are zero. + */ + if (unlikely(vmaxvq_u32(t0) != 0)) { + return false; + } + + t0 = p[0] | p[1]; + t1 = p[2] | p[3]; + t2 = p[4] | p[5]; + t3 = p[6] | p[7]; + REASSOC_BARRIER(t0, t1); + REASSOC_BARRIER(t2, t3); + t0 |= t1; + t2 |= t3; + REASSOC_BARRIER(t0, t2); + t0 |= t2; + p += 8; + } while (p < e - 7); + + return vmaxvq_u32(t0) == 0; +} + +static biz_accel_fn const accel_table[] = { + buffer_is_zero_int_ge256, + buffer_is_zero_simd, +}; + +static unsigned accel_index = 1; +#define INIT_ACCEL buffer_is_zero_simd + +bool test_buffer_is_zero_next_accel(void) +{ + if (accel_index != 0) { + buffer_is_zero_accel = accel_table[--accel_index]; + return true; + } + return false; +} + #else + bool test_buffer_is_zero_next_accel(void) { return false; From patchwork Sat Feb 17 00:39:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 773698 Delivered-To: patch@linaro.org Received: by 2002:a5d:4943:0:b0:33b:4db1:f5b3 with SMTP id r3csp206570wrs; Fri, 16 Feb 2024 16:40:33 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCWnoyqLQdzrfskvU1CPh4vY1sTApUsNXWeJuvBQDIZnhLMSrVEb+3p/b7xBuLUdhDrVSo/wnvM0az4LN6RFaNGl X-Google-Smtp-Source: AGHT+IFKld8j7Hd2ecjYzjfpX/1Lm5SWNQTD9xiIZf3VABDisoJcvlrKrTrAMz+csdfZsHWUEXxP X-Received: by 2002:a0c:f3d0:0:b0:68f:325f:d88c with SMTP id f16-20020a0cf3d0000000b0068f325fd88cmr3281826qvm.1.1708130432951; Fri, 16 Feb 2024 16:40:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1708130432; cv=none; d=google.com; s=arc-20160816; b=Fhr4TxSiKZRPD7kuoMW7bfTgwEacqb8vFBSeDThYHVjEYiG5Vmzmq2+xYSatGuq+LZ 2IBHm2C3+vhia2NMbRSyt1xx3M8nGtZ/cXzsRGmuGKE67Hyj52pfeJfPx36WKhfPYRAt A7cAc+Wf2DqOmLFj7gBTR0CXDfOqxZBf1ogM3J5AxFXLkL6++yyE8gAxLA3jDHp+CnJl lIf7W48AnDYfHQmCebET/+98WlwoTmLa3rp3GQql1WWVzbD5OKAfdYHvm9wEiTRaVof2 ftCgNn9+Eg70KxnirzSXot9gMi/ejfUrSQxPaotxIOZVjlxtopNBr2JC3X8Rs94P3bME WZKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=q7NI6Uyf4Ttcuc/zyuTSXU3zmAAyzJ/7snw/vL2uCLQ=; fh=bXD9UGo7d05OYjVg0qF2wA28p2gxs496M6ftxgeDKVY=; b=HMKU5mpGURNgREflWMs7mrfBfagI29tWGSQI68MlcmmHRwCLupBJMQW7sVdWRnmMqI DFe+TN4TfF07xR1YbkpwN4NlQMCbojZIzNnagvuPMMzRwEMyi3HxoMFyXjUayKhDUdrp LactegK9w32Tj+jFMcfg3mn87NTGQrQSplzEeryOsiuKLemfX+fJolqxl0bI+IKmvDaU /RPIpuYHBqaqSAFpoMAGayCo+JjHuVV8qZwEIDAd0/7VYZvpEYw9oG1D20/K6AgjlRQE irqj1zupUVowqfXyp9N0DpiCcPMVXBJ40BokfQ/BMpT2ipqsQ5lg5hYsMZclG7eWjMoQ X2ZA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=A+7ECMEH; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id gm8-20020a056214268800b0068efb6d010dsi965073qvb.129.2024.02.16.16.40.32 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 16 Feb 2024 16:40:32 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=A+7ECMEH; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rb8jz-00069T-VA; Fri, 16 Feb 2024 19:39:39 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rb8jy-00069I-7Z for qemu-devel@nongnu.org; Fri, 16 Feb 2024 19:39:38 -0500 Received: from mail-pl1-x634.google.com ([2607:f8b0:4864:20::634]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rb8jv-0008Hr-S5 for qemu-devel@nongnu.org; Fri, 16 Feb 2024 19:39:37 -0500 Received: by mail-pl1-x634.google.com with SMTP id d9443c01a7336-1d934c8f8f7so27591955ad.2 for ; Fri, 16 Feb 2024 16:39:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1708130374; x=1708735174; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=q7NI6Uyf4Ttcuc/zyuTSXU3zmAAyzJ/7snw/vL2uCLQ=; b=A+7ECMEHbBYmVaybAF8cM0cQgyB2QbbPGAmdw42c1eTrDWJgwnCWPUmiClNEG8MF4n LM5LJzKfRWTOcYX4M6zFPSI53BDOfy2a5CnS+wVJMcCkSXAFxNnMnBtMGlM5c0JpOZga c/lPjXDvJFL5Zpg2XheUVB9zVwF8Vyg1ftZ1Sms4vsY2Vti4QWo+7QL/RMt5jF+Wqww0 UhYwUwXfPo4qRyVoqD5Yenm9PhMlfuQoLa4FHn/eB3L9nK9ht7Cx/MJQUb9eC8Rmcmii Tka6xBilw+v9KNxiVbUjc7wSVA92oxPs4eC49FQPQLpES38G6V6g3i7hC/goIOWzp40L lE7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708130374; x=1708735174; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=q7NI6Uyf4Ttcuc/zyuTSXU3zmAAyzJ/7snw/vL2uCLQ=; b=BVVwxw9mviGPhGHdkKeSr6ukvvFI3wibSxf69z1TblsfC1PBQJMkRal5spGo2X/wsQ t3+uFbEit1Nomgy9+1zTTC+iZ7QzuqbKxSx/fWle/vjjrB2u0bZ5rJ91OCXJ7OGxby22 +F6qZB9z8AfrdOn/4VU2dMPd+SXYIwXry0s6Q4yzDjwL7m+htOWCKs42pnXktKuuDElL BzG4DnOsHnULTz6PcBg91CYFq2WZZxclCA9rybwGu+Ro0k/9Fw2an+pBgduyClyAHQ25 WRBgk28EajfCujj3VrNlmeQ/mhs+Vj4eoGmwu8TBEzMF1DflVpjMiL6scYq6ppcaxqR1 eYcw== X-Gm-Message-State: AOJu0Yw9fb8RpbUPlNruKyNP5jwehq08QgdkM87coK1AsBNCf9D8YZZ/ pi7f+XDOd7eVnt5oVhLj/h+MiLevL9lY/8l0mYtJgd1WXwrUgQ+r0/hEGq6CkxefDa0hQpXUU4z K X-Received: by 2002:a17:902:ce92:b0:1d7:147d:6a1d with SMTP id f18-20020a170902ce9200b001d7147d6a1dmr7161232plg.55.1708130374676; Fri, 16 Feb 2024 16:39:34 -0800 (PST) Received: from stoup.. (173-197-098-125.biz.spectrum.com. [173.197.98.125]) by smtp.gmail.com with ESMTPSA id z6-20020a170902ee0600b001d90306bdcfsm419325plb.65.2024.02.16.16.39.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Feb 2024 16:39:34 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: amonakov@ispras.ru, mmromanov@ispras.ru Subject: [PATCH v5 10/10] tests/bench: Add bufferiszero-bench Date: Fri, 16 Feb 2024 14:39:18 -1000 Message-Id: <20240217003918.52229-11-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240217003918.52229-1-richard.henderson@linaro.org> References: <20240217003918.52229-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::634; envelope-from=richard.henderson@linaro.org; helo=mail-pl1-x634.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Benchmark each acceleration function vs an aligned buffer of zeros. Signed-off-by: Richard Henderson --- tests/bench/bufferiszero-bench.c | 42 ++++++++++++++++++++++++++++++++ tests/bench/meson.build | 4 ++- 2 files changed, 45 insertions(+), 1 deletion(-) create mode 100644 tests/bench/bufferiszero-bench.c diff --git a/tests/bench/bufferiszero-bench.c b/tests/bench/bufferiszero-bench.c new file mode 100644 index 0000000000..1fa2eb6973 --- /dev/null +++ b/tests/bench/bufferiszero-bench.c @@ -0,0 +1,42 @@ +/* + * QEMU buffer_is_zero speed benchmark + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * (at your option) any later version. See the COPYING file in the + * top-level directory. + */ +#include "qemu/osdep.h" +#include "qemu/cutils.h" +#include "qemu/units.h" + +static void test(const void *opaque) +{ + size_t len = 64 * KiB; + void *buf = g_malloc0(len); + int accel_index = 0; + + do { + double total = 0.0; + + g_test_timer_start(); + do { + buffer_is_zero_ge256(buf, len); + total += len; + } while (g_test_timer_elapsed() < 5.0); + + total /= MiB; + g_test_message("buffer_is_zero #%d: %.2f MB/sec", + accel_index, total / g_test_timer_last()); + + accel_index++; + } while (test_buffer_is_zero_next_accel()); + + g_free(buf); +} + +int main(int argc, char **argv) +{ + g_test_init(&argc, &argv, NULL); + g_test_add_data_func("/cutils/bufferiszero/speed", NULL, test); + return g_test_run(); +} diff --git a/tests/bench/meson.build b/tests/bench/meson.build index 7e76338a52..70d45ff400 100644 --- a/tests/bench/meson.build +++ b/tests/bench/meson.build @@ -17,7 +17,9 @@ executable('atomic64-bench', dependencies: [qemuutil], build_by_default: false) -benchs = {} +benchs = { + 'bufferiszero-bench': [], +} if have_block benchs += {