From patchwork Tue Apr 30 19:42:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 793362 Delivered-To: patch@linaro.org Received: by 2002:a05:6000:54f:b0:34d:5089:5a9e with SMTP id b15csp437875wrf; Tue, 30 Apr 2024 12:43:49 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXwuLfb6+w1eMsXUfdMLRP3AtGJOK3TUEVy7Hke2phW5DBsk+B+K+bSYDqTkW+zGLwoChANNtHnqB3lN7vrXZ0+ X-Google-Smtp-Source: AGHT+IFQg1PHw3e9HXzW6Uz7SQ5zzXKyyppqZ226ORAFvs+vI+o4j4csN7wzrTsAPldElUH0GKid X-Received: by 2002:a05:620a:2805:b0:790:fc62:efd4 with SMTP id f5-20020a05620a280500b00790fc62efd4mr1501987qkp.1.1714506229091; Tue, 30 Apr 2024 12:43:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714506229; cv=none; d=google.com; s=arc-20160816; b=iWY/OGaKzhnP8i8fomJPaENakLnskIVdDe1H3HrPSHle+OwOnRCb7/1O/NSdsoWsu3 smKuqEIE3otjWeglVPIO/TrVZkYo6fok92P4Y9u8xDi/wIuwdvBKEwMaoUXKlSSLNL9j ebg3ueOSSPfX2V7X4H9x+nFYuKjrQfzQSwCNGIk7jejH+cMzGoTsNWWAZyam0iIpejrq M8uhrexrBZcnoJZqfcnsANO4+BYP7EgqBvOBdyNbM/ZtsZ+K2w0aII9bfaLC30wORiUo dmA9YfS/nd017njsFhnKkXL8U0we9qwnqm4oKH/D17viWnHs5FYKVgP+kOkUqbek6yk+ 9dHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=r70v09Z1jYUdUmudZIEve8pIi40latYcxErAtIALfYA=; fh=QtYERfLY+/6lXIVoMxqjAcWASzkMPfeQc3Ur4v8xQcU=; b=sktS4dCaUUruX6X4pWouV0T0uHrHVcUb5LfTpRQMF/HPTSDGvmzT3LohEZDT+zBJ/G ZzrEUR9jHqnyVdsjJO3ywVMTcIE6WnnB3T8vmIbISMoitUcpZX/twa5GcHlH34vASaDj 4rLBOwYWaRHutCA70T/0EASuZDkY/umMkVJxA5TntznA5Icgk3bJtE/X6j0ug7NugDtO iCPKhR8bQMe1mn1coCZWKiPiBJQ0LuwfykGMTaKnRyCOYIOJkpRdaEEfkPLEpPG/tVtz Cxzdav6KD0XOMM5vEXchTCuIBmn2sbZxC8spTqtucR5GvycFtLugtlywP8qbd7YQKKgA ofqA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=jyrlgSxy; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id pi4-20020a05620a378400b0078bcbec867esi28426608qkn.597.2024.04.30.12.43.48 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Tue, 30 Apr 2024 12:43:49 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=jyrlgSxy; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s1tNV-0006zR-4K; Tue, 30 Apr 2024 15:43:01 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s1tNT-0006vw-UW for qemu-devel@nongnu.org; Tue, 30 Apr 2024 15:42:59 -0400 Received: from mail-pf1-x434.google.com ([2607:f8b0:4864:20::434]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s1tNS-0006k4-0p for qemu-devel@nongnu.org; Tue, 30 Apr 2024 15:42:59 -0400 Received: by mail-pf1-x434.google.com with SMTP id d2e1a72fcca58-6ee0642f718so156252b3a.0 for ; Tue, 30 Apr 2024 12:42:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1714506176; x=1715110976; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=r70v09Z1jYUdUmudZIEve8pIi40latYcxErAtIALfYA=; b=jyrlgSxyJz4WA3y0n8yczB1a1uw4UoXPRQKHoOAnOf08P2rE73TqGRS+pETOAc7uDu Qt/Ei9UYr1XdeA5hiVKFbslLaHG3oPk+MjwxBN9DBv9oeg6UtPGBYWGw2gcKk+dHE7Gb e+YIRgJ4qNnNLHXvOx/FR1pWwlNAwYEN86Hp9XW4srF4XwUhk40S8veQL8QkDBhdo3rp SaDB/YSlxVCOH24ljqKjets5R2lkD3ypzZN7jpOu683KL1eyXGbPTs8I0XeB3+P4bq4p yoa1rgPPD32pkOnsofGjQrwhG0cUmc57JUeB6ghDpgXdg8ZL6jtjKPY2OlRvIO7CRYDp lcvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714506176; x=1715110976; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r70v09Z1jYUdUmudZIEve8pIi40latYcxErAtIALfYA=; b=k39JDDmRGdAo5ITxMPeQOiKexHZCEIcMrcjP/g478ZE4wEJl6fb8yIEJFtLB1oyyA5 plYoKckonBTpPZE7fHLIow91UKPz4+rF8NU4+heemsDkIt36SBnrtDMDdAwhhculR11l h1dzV7JVzdmCWYVCkXcUpX82nSqQRkKGoNkKhOLY2li/TL5ZVrPsuPw4msWfANLMAhAO dUvOXM+4UlOToEvCmlj76o4/6HLQklHy7gaWAd0S1qA+5Yv+09dzuOYdz5op7totagnL T/rcbK3o7reRQXFPRHTiyKrUMHp/aDSQ7OqV97XMmaNgmHBVJzErLuPRY6lRcMd03t5Y N3IQ== X-Gm-Message-State: AOJu0YzYf1otlUtoHc6NtbYs2alpwPaLAkeCmU2/hpbUpNM+kELU0BNC XT8WNbINZLGhm2vywhIpchqEG4M58/iL0T3uVsIyDimzNK4PFOgKYxwszMKfZQkQ+ZcvAIP0rBf l X-Received: by 2002:a05:6a20:7f99:b0:1af:37bf:d7de with SMTP id d25-20020a056a207f9900b001af37bfd7demr1320087pzj.7.1714506176156; Tue, 30 Apr 2024 12:42:56 -0700 (PDT) Received: from stoup.. (174-21-72-5.tukw.qwest.net. [174.21.72.5]) by smtp.gmail.com with ESMTPSA id d12-20020a63360c000000b005d880b41598sm20861523pga.94.2024.04.30.12.42.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Apr 2024 12:42:55 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: philmd@linaro.org, Alexander Monakov , Mikhail Romanov Subject: [PATCH v7 01/10] util/bufferiszero: Remove SSE4.1 variant Date: Tue, 30 Apr 2024 12:42:44 -0700 Message-Id: <20240430194253.904768-2-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240430194253.904768-1-richard.henderson@linaro.org> References: <20240430194253.904768-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::434; envelope-from=richard.henderson@linaro.org; helo=mail-pf1-x434.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov The SSE4.1 variant is virtually identical to the SSE2 variant, except for using 'PTEST+JNZ' in place of 'PCMPEQB+PMOVMSKB+CMP+JNE' for testing if an SSE register is all zeroes. The PTEST instruction decodes to two uops, so it can be handled only by the complex decoder, and since CMP+JNE are macro-fused, both sequences decode to three uops. The uops comprising the PTEST instruction dispatch to p0 and p5 on Intel CPUs, so PCMPEQB+PMOVMSKB is comparatively more flexible from dispatch standpoint. Hence, the use of PTEST brings no benefit from throughput standpoint. Its latency is not important, since it feeds only a conditional jump, which terminates the dependency chain. I never observed PTEST variants to be faster on real hardware. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-2-amonakov@ispras.ru> --- util/bufferiszero.c | 29 ----------------------------- 1 file changed, 29 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 3e6a5dfd63..f5a3634f9a 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -100,34 +100,6 @@ buffer_zero_sse2(const void *buf, size_t len) } #ifdef CONFIG_AVX2_OPT -static bool __attribute__((target("sse4"))) -buffer_zero_sse4(const void *buf, size_t len) -{ - __m128i t = _mm_loadu_si128(buf); - __m128i *p = (__m128i *)(((uintptr_t)buf + 5 * 16) & -16); - __m128i *e = (__m128i *)(((uintptr_t)buf + len) & -16); - - /* Loop over 16-byte aligned blocks of 64. */ - while (likely(p <= e)) { - __builtin_prefetch(p); - if (unlikely(!_mm_testz_si128(t, t))) { - return false; - } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } - - /* Finish the aligned tail. */ - t |= e[-3]; - t |= e[-2]; - t |= e[-1]; - - /* Finish the unaligned tail. */ - t |= _mm_loadu_si128(buf + len - 16); - - return _mm_testz_si128(t, t); -} - static bool __attribute__((target("avx2"))) buffer_zero_avx2(const void *buf, size_t len) { @@ -221,7 +193,6 @@ select_accel_cpuinfo(unsigned info) #endif #ifdef CONFIG_AVX2_OPT { CPUINFO_AVX2, 128, buffer_zero_avx2 }, - { CPUINFO_SSE4, 64, buffer_zero_sse4 }, #endif { CPUINFO_SSE2, 64, buffer_zero_sse2 }, { CPUINFO_ALWAYS, 0, buffer_zero_int }, From patchwork Tue Apr 30 19:42:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 793363 Delivered-To: patch@linaro.org Received: by 2002:a05:6000:54f:b0:34d:5089:5a9e with SMTP id b15csp437908wrf; Tue, 30 Apr 2024 12:43:52 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCULiLc6wYmoEWplUlDxiQCstgwVrbm4Y845T59PRRadEV2BaaESaX7bVCMmq3kK6VxPMWIvw561YnkiTM8bb91J X-Google-Smtp-Source: AGHT+IG15YoSpY/gXfsmbqm2+7xLQQ83JmHT/ckPV3ENiaks0L86eNOrQCLOaWo6l6ggmkaHoVga X-Received: by 2002:a05:620a:229:b0:790:f408:bfd with SMTP id u9-20020a05620a022900b00790f4080bfdmr302589qkm.25.1714506232403; Tue, 30 Apr 2024 12:43:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714506232; cv=none; d=google.com; s=arc-20160816; b=NKMsj0dOcqyHHW4DJ2bQVPHGlL9W43StZxt7PrXb/X+JKdzRjtNfpuE4/YKdr3rlve cnkuV14eM6luSCeTm5UpAHzAdD4M8vdAsIjEk1IGomXwIofZP8ojCuuZz2E5twZfz/2f uJUoaPxx6UWi6w77wxSAro3VNNvXcBHMFGiucyAy1DxlagRXe+NCjNqnDKUYtwDsux6n +JOf1064zbFmjhNlEHBHiaDzcuwrd7bS8qAr710OqFU1bqpKsgP6XIhE16oP9vjAtUNn G9wBV2V1yU0YJpalpKA22Kj3PhHAmkscV70HQOKg/13sgiHR4SX9KGX7Qw1I09sMC8yH /lfQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=Sc6Au+pExfgVbkEVt3xBcrgjk6DJCZWLxXzjjgNJTcI=; fh=QtYERfLY+/6lXIVoMxqjAcWASzkMPfeQc3Ur4v8xQcU=; b=Pg/zWeShnLiRmsIIF9uOAX/VqruKTjZKJ3OWIOUbVh0mrZPA/+PELrctKQCOCOxWuO b63bfv29w+FmY1mM+MDuLwZ131/HTIKoo67mBcSQ8lxExxUEnLtWUrkdexZ9ThnYP6A5 v08H6PitX6HaaQ1J9APHHCWJDn1tgVB1g+uvl6Q5hnqD2XsHbh3U6h07vkHF0smdibQK CEfeyEO0Fh3c6F/YBZXVfR9CCy22Auvcr+YZMSBkiNuPwSGwomtASTl633fnBXX9RxnH 5m/Pubmt5Ap+yZz7MRdntUnC+CEoluvQh5viJ41wP/vjKgpISCnozz4ypRdRYojFkMup UxyQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Vvy3pkQD; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id p7-20020a05622a13c700b0043affa73469si4380471qtk.305.2024.04.30.12.43.52 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Tue, 30 Apr 2024 12:43:52 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Vvy3pkQD; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s1tNc-00074h-AQ; Tue, 30 Apr 2024 15:43:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s1tNa-000748-Db for qemu-devel@nongnu.org; Tue, 30 Apr 2024 15:43:06 -0400 Received: from mail-pf1-x433.google.com ([2607:f8b0:4864:20::433]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s1tNS-0006kF-H7 for qemu-devel@nongnu.org; Tue, 30 Apr 2024 15:43:06 -0400 Received: by mail-pf1-x433.google.com with SMTP id d2e1a72fcca58-6f2f6142d64so5787622b3a.2 for ; Tue, 30 Apr 2024 12:42:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1714506177; x=1715110977; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Sc6Au+pExfgVbkEVt3xBcrgjk6DJCZWLxXzjjgNJTcI=; b=Vvy3pkQD0rOsLiPZ03NwS5xQlZXHbhUAYCehCdY+hl5eCD7sfyYJlO8aPngZYr1ThH oVN8CPXC9rAXbX2HQwDVqTR5GpB8vjBDOCBfQxPmMxdJd4/xWNGhaa/7mdWE2gDNWfuV meKh8xr2G8OkmdazYVbm8pOsmoM5CHaYlZGOmg5Lh9+XKy/PW9sMUSoR7ZcgpkGQ3m30 MpXPTADjx+yp60XBf2hGMsrlQOhemEJG3Qnicy6Z+tPsRduQWFtNqT4vqgeMlhkhGZkQ B5pmfpeMxZfwx6waJsUx2ZUO3QAeb+xUVfOpr78NWi2S3tXpCtcdyf3e6YeaQ7nkgVs3 bKEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714506177; x=1715110977; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Sc6Au+pExfgVbkEVt3xBcrgjk6DJCZWLxXzjjgNJTcI=; b=BXT28Xh4NvfQkL51eCSXqW5eib5seMk+DM+u/fsoVov8GYBtwuAZ3Jqs8RGLcbFGVc ivVha2VrlmCxZQNTsMoUKswWOxBYueOpmhhqiL8c6YtmtH0q8MOi0alJbyTxVd39n6WT qgXt45jto8104B8c4izBPUSmz1RmNo7dNzrvn+XS1+jx2XRubhjzzzXJxe82FVuHzWx2 D+W0R/iVC+3AvOn2SP8skZ1sgpPzrIWIzto1oL98+CaQMZNvbCjexlqqc12BEJh6q9dY RPffHElYdVmO3wrFKybktU2J6qBKUC+Z2OCw81mC86OafpwL6Ku4kMlw8KG4wpxew4Oz YN9w== X-Gm-Message-State: AOJu0YxLZ90zmo87hqxnUloaczETsx82iBvstAzkfUtasW4zIwJTeQDu 6HHdL3oqgogoS4lR6rdOClzP67ZLHqgfL66fGbO6MNycD94yLrS5v9j7o70SSEBUU9ferFL/EUT h X-Received: by 2002:a05:6a20:1fa5:b0:1a7:aa08:16de with SMTP id dm37-20020a056a201fa500b001a7aa0816demr697768pzb.40.1714506177058; Tue, 30 Apr 2024 12:42:57 -0700 (PDT) Received: from stoup.. (174-21-72-5.tukw.qwest.net. [174.21.72.5]) by smtp.gmail.com with ESMTPSA id d12-20020a63360c000000b005d880b41598sm20861523pga.94.2024.04.30.12.42.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Apr 2024 12:42:56 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: philmd@linaro.org, Alexander Monakov , Mikhail Romanov Subject: [PATCH v7 02/10] util/bufferiszero: Remove AVX512 variant Date: Tue, 30 Apr 2024 12:42:45 -0700 Message-Id: <20240430194253.904768-3-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240430194253.904768-1-richard.henderson@linaro.org> References: <20240430194253.904768-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::433; envelope-from=richard.henderson@linaro.org; helo=mail-pf1-x433.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov Thanks to early checks in the inline buffer_is_zero wrapper, the SIMD routines are invoked much more rarely in normal use when most buffers are non-zero. This makes use of AVX512 unprofitable, as it incurs extra frequency and voltage transition periods during which the CPU operates at reduced performance, as described in https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html Signed-off-by: Mikhail Romanov Signed-off-by: Alexander Monakov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-4-amonakov@ispras.ru> Signed-off-by: Richard Henderson --- util/bufferiszero.c | 38 +++----------------------------------- 1 file changed, 3 insertions(+), 35 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index f5a3634f9a..641d5f9b9e 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -64,7 +64,7 @@ buffer_zero_int(const void *buf, size_t len) } } -#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) || defined(__SSE2__) +#if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) #include /* Note that each of these vectorized functions require len >= 64. */ @@ -128,41 +128,12 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ -#ifdef CONFIG_AVX512F_OPT -static bool __attribute__((target("avx512f"))) -buffer_zero_avx512(const void *buf, size_t len) -{ - /* Begin with an unaligned head of 64 bytes. */ - __m512i t = _mm512_loadu_si512(buf); - __m512i *p = (__m512i *)(((uintptr_t)buf + 5 * 64) & -64); - __m512i *e = (__m512i *)(((uintptr_t)buf + len) & -64); - - /* Loop over 64-byte aligned blocks of 256. */ - while (p <= e) { - __builtin_prefetch(p); - if (unlikely(_mm512_test_epi64_mask(t, t))) { - return false; - } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } - - t |= _mm512_loadu_si512(buf + len - 4 * 64); - t |= _mm512_loadu_si512(buf + len - 3 * 64); - t |= _mm512_loadu_si512(buf + len - 2 * 64); - t |= _mm512_loadu_si512(buf + len - 1 * 64); - - return !_mm512_test_epi64_mask(t, t); - -} -#endif /* CONFIG_AVX512F_OPT */ - /* * Make sure that these variables are appropriately initialized when * SSE2 is enabled on the compiler command-line, but the compiler is * too old to support CONFIG_AVX2_OPT. */ -#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) +#if defined(CONFIG_AVX2_OPT) # define INIT_USED 0 # define INIT_LENGTH 0 # define INIT_ACCEL buffer_zero_int @@ -188,9 +159,6 @@ select_accel_cpuinfo(unsigned info) unsigned len; bool (*fn)(const void *, size_t); } all[] = { -#ifdef CONFIG_AVX512F_OPT - { CPUINFO_AVX512F, 256, buffer_zero_avx512 }, -#endif #ifdef CONFIG_AVX2_OPT { CPUINFO_AVX2, 128, buffer_zero_avx2 }, #endif @@ -208,7 +176,7 @@ select_accel_cpuinfo(unsigned info) return 0; } -#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) +#if defined(CONFIG_AVX2_OPT) static void __attribute__((constructor)) init_accel(void) { used_accel = select_accel_cpuinfo(cpuinfo_init()); From patchwork Tue Apr 30 19:42:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 793366 Delivered-To: patch@linaro.org Received: by 2002:a05:6000:54f:b0:34d:5089:5a9e with SMTP id b15csp437994wrf; Tue, 30 Apr 2024 12:44:09 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCU/f6pBuIBOckIAW41LhVIDzM66pso6bj14tJyqEbjZvVdTVGuSkHcweVF9t53I36TIq0BK4L21jRlKYRo6MGds X-Google-Smtp-Source: AGHT+IGU185Rh9LAUhhlyYPPJnmR+xqsJy64DY7OXiMKnTWOTws83qvzf0ZBmIeQ5asFUCdV2T0e X-Received: by 2002:a05:6122:3694:b0:4d4:2931:7d4d with SMTP id ec20-20020a056122369400b004d429317d4dmr810886vkb.5.1714506248623; Tue, 30 Apr 2024 12:44:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714506248; cv=none; d=google.com; s=arc-20160816; b=YcniiABho4LPucWFzbbXGuorsyz/rcRpW3XjYqxpPCgJbmAQGb1NtPdrF2u9RCIZWK 9p91vWJFTaG76u1vBr+Yj0kr7L+kAp5c2jkZN00CJQHFxavFtmup3hScP3lyLZwaW1vT wjczHm+vXDsBQqSk/6fclxezpbYO9z/1Q8888lV2DzqYOKEvMzZy0rKUL+O3m1vfCZmw 24I4BFSdK1jR9aygnmP9JuHfGfwdgQw1UNkr3vmBQbm4zQLMLdwRn+VomkDoyOXOH2hz z/lLvGMT2Yk1uS3BVxvWz7SqhjPpkalUTGjcReW0ZiuALx7s8FMHuY1i/1ZVFa7sPnow X2PA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=Aco89HGhDUqxtzZSl7gQEXCIzX7CLhr3rgmAxnnD0GI=; fh=QtYERfLY+/6lXIVoMxqjAcWASzkMPfeQc3Ur4v8xQcU=; b=QwK9YkESL4HRX9rheFSppXKEJA0dVYYjFdIuBABmeegWkzuvtICrGrkr5+tKM4sLsN AwuMoy91zVFC7XjFiT/ZbjANeBu14qES1nDlk9AgBWpEhB2sYGpIEEzJVhBy7xTgVgWM W7HlmHlY8egcU36fBeOokW7cn407tF8snWv40TQxRvvskMI3DzMemAhmuQ7Tmt0ifwTL LgSC1noZfBOv/uOdI9qdP2fCLqTmZ/RvxHyDUsnkgQrGBaI7SEzDRV/1jqflWW468OQ3 9+dfol0OkJZjwCMU2g7aPY/jd5rlH6PXbuGkPzzfXpM4gbpe+hRXBleoSvx44D4UeymQ nHeg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=vCNbjtn1; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id pc2-20020a05620a840200b0078edc088d1asi28346738qkn.160.2024.04.30.12.44.08 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Tue, 30 Apr 2024 12:44:08 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=vCNbjtn1; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s1tNY-00073H-GX; Tue, 30 Apr 2024 15:43:04 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s1tNW-00072D-Ra for qemu-devel@nongnu.org; Tue, 30 Apr 2024 15:43:02 -0400 Received: from mail-pf1-x42c.google.com ([2607:f8b0:4864:20::42c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s1tNU-0006kQ-PG for qemu-devel@nongnu.org; Tue, 30 Apr 2024 15:43:02 -0400 Received: by mail-pf1-x42c.google.com with SMTP id d2e1a72fcca58-6f3f6aa1437so2738006b3a.3 for ; Tue, 30 Apr 2024 12:43:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1714506178; x=1715110978; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Aco89HGhDUqxtzZSl7gQEXCIzX7CLhr3rgmAxnnD0GI=; b=vCNbjtn1tt5wRcjh52RhtlARq/neSk7+xPSuF4QqYIBUdWnlrIFn9FsNzj4CJ6Htsp YPN9mdL4I7FvP40watbnmVJrC6QdEgpO3bdQNwRKos2cdLpKY1mVOOHzGZVtEV3TB+so W4aMxPZ9CX55QWxiF8NMCj8vagkteWHgKtuxb+P9l7F+7fxfqSuYSMaDsFpm9lGtxfW+ pBAYJxZ8U0QHfllBkuI3tTATnSwbJalr27aUGp7M1TWvEN/lLdKuhRy2xk8V+sK/wAnM 17niRDjC9d+oVGX8uorhr/FofWmCr/u0cnlGU1IhwTjwbK/bMOHaohkNM/geWoSiQZEx tyzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714506178; x=1715110978; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Aco89HGhDUqxtzZSl7gQEXCIzX7CLhr3rgmAxnnD0GI=; b=gLJcYGbjZfVNX7wpRo5IAa6Zw2YUUnlJJHBdji2j4F86ojsgyrJI2Za0f2jM6ES7kn XyKIMD82vQrAcg+3/mKzftFlN7ouJTzK5To7vgOq6AZpcu8I9gyeLwJGSDitejxRsYf7 qdnjaStPx5rfZI4GDKoPVbnTShFnr+Q+e+fjpKSIOyb5Xe0VxWHzni16Mwjdr6U7UAzZ N6Z/yXU/7zylzZ0PsYmUBkRJ1nvnz7GfrOjGxFgR8vG724iFjJwB67NjsB09g9qUBqOa U9teLO715yauZ6XLZF01MlE8qF61T2QAQfPTdnow2E8kW0KWn/ekbHFVnnkmcqV0TcVj E/CA== X-Gm-Message-State: AOJu0YyfQLiaSCBprc9GkQcycuJkmg7hbVA1UYgrY+O53kPoKneCQnOo 6tnTfL1OAqf5Gq3uqesdqKvTkd5lExiF7gXWfSAYPldFeo+hwuCtWxtCvXKoBsGHvUsy/hp9Vvl z X-Received: by 2002:a05:6a21:3a83:b0:1af:66a9:d104 with SMTP id zv3-20020a056a213a8300b001af66a9d104mr451437pzb.1.1714506177934; Tue, 30 Apr 2024 12:42:57 -0700 (PDT) Received: from stoup.. (174-21-72-5.tukw.qwest.net. [174.21.72.5]) by smtp.gmail.com with ESMTPSA id d12-20020a63360c000000b005d880b41598sm20861523pga.94.2024.04.30.12.42.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Apr 2024 12:42:57 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: philmd@linaro.org, Alexander Monakov , Mikhail Romanov Subject: [PATCH v7 03/10] util/bufferiszero: Reorganize for early test for acceleration Date: Tue, 30 Apr 2024 12:42:46 -0700 Message-Id: <20240430194253.904768-4-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240430194253.904768-1-richard.henderson@linaro.org> References: <20240430194253.904768-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::42c; envelope-from=richard.henderson@linaro.org; helo=mail-pf1-x42c.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov Test for length >= 256 inline, where is is often a constant. Before calling into the accelerated routine, sample three bytes from the buffer, which handles most non-zero buffers. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Message-Id: <20240206204809.9859-3-amonakov@ispras.ru> [rth: Use __builtin_constant_p; move the indirect call out of line.] Signed-off-by: Richard Henderson --- include/qemu/cutils.h | 32 ++++++++++++++++- util/bufferiszero.c | 84 +++++++++++++++++-------------------------- 2 files changed, 63 insertions(+), 53 deletions(-) diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h index 92c927a6a3..741dade7cf 100644 --- a/include/qemu/cutils.h +++ b/include/qemu/cutils.h @@ -187,9 +187,39 @@ char *freq_to_str(uint64_t freq_hz); /* used to print char* safely */ #define STR_OR_NULL(str) ((str) ? (str) : "null") -bool buffer_is_zero(const void *buf, size_t len); +/* + * Check if a buffer is all zeroes. + */ + +bool buffer_is_zero_ool(const void *vbuf, size_t len); +bool buffer_is_zero_ge256(const void *vbuf, size_t len); bool test_buffer_is_zero_next_accel(void); +static inline bool buffer_is_zero_sample3(const char *buf, size_t len) +{ + /* + * For any reasonably sized buffer, these three samples come from + * three different cachelines. In qemu-img usage, we find that + * each byte eliminates more than half of all buffer testing. + * It is therefore critical to performance that the byte tests + * short-circuit, so that we do not pull in additional cache lines. + * Do not "optimize" this to !(a | b | c). + */ + return !buf[0] && !buf[len - 1] && !buf[len / 2]; +} + +#ifdef __OPTIMIZE__ +static inline bool buffer_is_zero(const void *buf, size_t len) +{ + return (__builtin_constant_p(len) && len >= 256 + ? buffer_is_zero_sample3(buf, len) && + buffer_is_zero_ge256(buf, len) + : buffer_is_zero_ool(buf, len)); +} +#else +#define buffer_is_zero buffer_is_zero_ool +#endif + /* * Implementation of ULEB128 (http://en.wikipedia.org/wiki/LEB128) * Input is limited to 14-bit numbers diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 641d5f9b9e..972f394cbd 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -26,8 +26,9 @@ #include "qemu/bswap.h" #include "host/cpuinfo.h" -static bool -buffer_zero_int(const void *buf, size_t len) +static bool (*buffer_is_zero_accel)(const void *, size_t); + +static bool buffer_is_zero_integer(const void *buf, size_t len) { if (unlikely(len < 8)) { /* For a very small buffer, simply accumulate all the bytes. */ @@ -128,60 +129,38 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ -/* - * Make sure that these variables are appropriately initialized when - * SSE2 is enabled on the compiler command-line, but the compiler is - * too old to support CONFIG_AVX2_OPT. - */ -#if defined(CONFIG_AVX2_OPT) -# define INIT_USED 0 -# define INIT_LENGTH 0 -# define INIT_ACCEL buffer_zero_int -#else -# ifndef __SSE2__ -# error "ISA selection confusion" -# endif -# define INIT_USED CPUINFO_SSE2 -# define INIT_LENGTH 64 -# define INIT_ACCEL buffer_zero_sse2 -#endif - -static unsigned used_accel = INIT_USED; -static unsigned length_to_accel = INIT_LENGTH; -static bool (*buffer_accel)(const void *, size_t) = INIT_ACCEL; - static unsigned __attribute__((noinline)) select_accel_cpuinfo(unsigned info) { /* Array is sorted in order of algorithm preference. */ static const struct { unsigned bit; - unsigned len; bool (*fn)(const void *, size_t); } all[] = { #ifdef CONFIG_AVX2_OPT - { CPUINFO_AVX2, 128, buffer_zero_avx2 }, + { CPUINFO_AVX2, buffer_zero_avx2 }, #endif - { CPUINFO_SSE2, 64, buffer_zero_sse2 }, - { CPUINFO_ALWAYS, 0, buffer_zero_int }, + { CPUINFO_SSE2, buffer_zero_sse2 }, + { CPUINFO_ALWAYS, buffer_is_zero_integer }, }; for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { if (info & all[i].bit) { - length_to_accel = all[i].len; - buffer_accel = all[i].fn; + buffer_is_zero_accel = all[i].fn; return all[i].bit; } } return 0; } -#if defined(CONFIG_AVX2_OPT) +static unsigned used_accel; + static void __attribute__((constructor)) init_accel(void) { used_accel = select_accel_cpuinfo(cpuinfo_init()); } -#endif /* CONFIG_AVX2_OPT */ + +#define INIT_ACCEL NULL bool test_buffer_is_zero_next_accel(void) { @@ -194,36 +173,37 @@ bool test_buffer_is_zero_next_accel(void) used_accel |= used; return used; } - -static bool select_accel_fn(const void *buf, size_t len) -{ - if (likely(len >= length_to_accel)) { - return buffer_accel(buf, len); - } - return buffer_zero_int(buf, len); -} - #else -#define select_accel_fn buffer_zero_int bool test_buffer_is_zero_next_accel(void) { return false; } + +#define INIT_ACCEL buffer_is_zero_integer #endif -/* - * Checks if a buffer is all zeroes - */ -bool buffer_is_zero(const void *buf, size_t len) +static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL; + +bool buffer_is_zero_ool(const void *buf, size_t len) { if (unlikely(len == 0)) { return true; } + if (!buffer_is_zero_sample3(buf, len)) { + return false; + } + /* All bytes are covered for any len <= 3. */ + if (unlikely(len <= 3)) { + return true; + } - /* Fetch the beginning of the buffer while we select the accelerator. */ - __builtin_prefetch(buf); - - /* Use an optimized zero check if possible. Note that this also - includes a check for an unrolled loop over 64-bit integers. */ - return select_accel_fn(buf, len); + if (likely(len >= 256)) { + return buffer_is_zero_accel(buf, len); + } + return buffer_is_zero_integer(buf, len); +} + +bool buffer_is_zero_ge256(const void *buf, size_t len) +{ + return buffer_is_zero_accel(buf, len); } From patchwork Tue Apr 30 19:42:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 793369 Delivered-To: patch@linaro.org Received: by 2002:a05:6000:54f:b0:34d:5089:5a9e with SMTP id b15csp438047wrf; Tue, 30 Apr 2024 12:44:20 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCU/sFaM7gQKqAksD5qx46COaLFR9vLVuWAwIg+HYZKCGRlKs6VlABOd0rFx+FhCUNfge8Aa93PaamvWjMaBTufb X-Google-Smtp-Source: AGHT+IFSU/b+2dVEse3816xpxRZwBs+rvTqBYFtL8TTwrFTydxmgcRbAK8yu1H7eaOCvDmPa8bTu X-Received: by 2002:a05:6214:ca3:b0:6a0:9770:39c2 with SMTP id s3-20020a0562140ca300b006a0977039c2mr289566qvs.54.1714506260388; Tue, 30 Apr 2024 12:44:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714506260; cv=none; d=google.com; s=arc-20160816; b=ibQ/A9/cB4shKZXoBAH7xmHgxzY0LjeZgxN43LrdeGTeSHbt63eKdxxCVq4IP8zboO fjyL+8usS6ugyeqN8s8ZEg5FJswT6QLLBFdjK2aXyGohkbmbEh6V3/lNWMjnCaj7GseI S324wid2+mZQJE6BY5ARM4UvTvPbtnj4MLfnLl4s8ab0fJceHORmRM6/m9RLZwOkxA/v D44WgWncnXHMyHHYIPVBG3yVc6Bo6VsMXyB5xgPlwQLYSDryxS2/JDmU3lc3CB00ISP1 16lZieFwNLO0xojdnjL9aXAjaMP8+aCVzzF3MInV0XpDB09hZAgedfn9m3s/JbYWABtZ a7PQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=RPMJ9Fq7Hagw2E+aglC+ITwqURcRV1FccXRLb5FLQFU=; fh=QtYERfLY+/6lXIVoMxqjAcWASzkMPfeQc3Ur4v8xQcU=; b=QrcolW/scgPBBQ9Pl+9W+BpJqWqctGxkmrFZz6qHw6gzEJJYTj2yXgXTTpvE4GC2sq XzVsFjGradgwpT+iUpBJ81jOyyks13H/eUpRCmDlFcbMzz+hXiGG6TZ1jbjTzSOSmUwW tsRTqmtJHHVPQgNhmDR26TljqhdxiC/nHrTmLkTHISZKQ1BmcPQuIbJh3QMDN3YBxfKy E9dC4QMspeZWWN0HeBo+K3Q7P5vPrs2k/cEk310sJY7CK2gA5Pw+Gur/auNDsz3dNv/m eoB8tyHOm72iUsM9fHMvk6lfet0KvRcaSIpcbrgNITFs16l7m7HMUHZyERgPLblCujCa DlRQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=N88rQM1c; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id kj22-20020a056214529600b006a0ac634515si11089621qvb.229.2024.04.30.12.44.20 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Tue, 30 Apr 2024 12:44:20 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=N88rQM1c; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s1tNX-00072Y-8t; Tue, 30 Apr 2024 15:43:03 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s1tNW-00071u-1y for qemu-devel@nongnu.org; Tue, 30 Apr 2024 15:43:02 -0400 Received: from mail-oa1-x33.google.com ([2001:4860:4864:20::33]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s1tNU-0006kX-Cw for qemu-devel@nongnu.org; Tue, 30 Apr 2024 15:43:01 -0400 Received: by mail-oa1-x33.google.com with SMTP id 586e51a60fabf-23d5df2d600so100186fac.3 for ; Tue, 30 Apr 2024 12:42:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1714506179; x=1715110979; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=RPMJ9Fq7Hagw2E+aglC+ITwqURcRV1FccXRLb5FLQFU=; b=N88rQM1c5IUmb3VjDJolYhKegfgxGmqPMMbu+0FWDIQrqxKsuCZzuV3NsMQQ6r8tpw QU0nvpEkSfA2n+oTSXj7eLh79KTOgXx6hIfEId7QNaFn0Ii202ESF20HYoxirvOhZ0dQ /+IrzKpeiROS567MiI9ikyBQytt/FjG6gHWh2E0zYOEzOS54fZmCpcuBpM/NpGWFk8pH lFBKEtpoCF1fZiN1/ua/McB4F84DPY27lsIvrwLjr+6QXSdDrV8zFD4CuaOavg3sONWE pKDbZ1aK3iue5L09HBd7SJhPuGs07w7Hu/iPwGIHQ+7t5K9kOArowM0Araei7SZc3/X7 7vhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714506179; x=1715110979; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RPMJ9Fq7Hagw2E+aglC+ITwqURcRV1FccXRLb5FLQFU=; b=LAYreS7BB7OrJ1s1J0w9E53SImG5id2PL58BIAiQIp8s6axrRIdQpMJuv0g0wcBhWX GKuPpw48IiFWxA3JjspVeNGDOt3STfKuaW040uuzdALpwcrmdB0YebKwe+NgDlE1DwXr xNPwofbY9KwBkuDZzmSg/QyS1W/G5pL/pD0wIgfle82bcisxFaOhiGpJNsAXbDiNysaZ oRUKRhEOBIBgPhmaswHrmTUie5QCKPlDzeK4DheXsS5qtmMaM97BrStYfzB0ekJEQ6K0 YJBDSb5SyWRcP9C27pK4j8VxpawTJmS1ZHTTWgm/RQ1xJVXL9opQ/IL6Pb3hh7kDeZe4 xIMQ== X-Gm-Message-State: AOJu0Yxaaubal0baRynM1iZuvPPGffrrD9ZpjoPrhZnS8P9CYw1C9tJ6 EocGpMEFDHsEBFEBIpm5wzuO629Gc1A06B2tIum8587lYAWRXakgKMHl3xp8DAcNwJmBk/ez2Bm s X-Received: by 2002:a05:6870:4153:b0:23b:f2d0:7b9c with SMTP id r19-20020a056870415300b0023bf2d07b9cmr521780oad.24.1714506178861; Tue, 30 Apr 2024 12:42:58 -0700 (PDT) Received: from stoup.. (174-21-72-5.tukw.qwest.net. [174.21.72.5]) by smtp.gmail.com with ESMTPSA id d12-20020a63360c000000b005d880b41598sm20861523pga.94.2024.04.30.12.42.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Apr 2024 12:42:58 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: philmd@linaro.org, Alexander Monakov , Mikhail Romanov Subject: [PATCH v7 04/10] util/bufferiszero: Remove useless prefetches Date: Tue, 30 Apr 2024 12:42:47 -0700 Message-Id: <20240430194253.904768-5-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240430194253.904768-1-richard.henderson@linaro.org> References: <20240430194253.904768-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2001:4860:4864:20::33; envelope-from=richard.henderson@linaro.org; helo=mail-oa1-x33.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov Use of prefetching in bufferiszero.c is quite questionable: - prefetches are issued just a few CPU cycles before the corresponding line would be hit by demand loads; - they are done for simple access patterns, i.e. where hardware prefetchers can perform better; - they compete for load ports in loops that should be limited by load port throughput rather than ALU throughput. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-5-amonakov@ispras.ru> --- util/bufferiszero.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 972f394cbd..00118d649e 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -50,7 +50,6 @@ static bool buffer_is_zero_integer(const void *buf, size_t len) const uint64_t *e = (uint64_t *)(((uintptr_t)buf + len) & -8); for (; p + 8 <= e; p += 8) { - __builtin_prefetch(p + 8); if (t) { return false; } @@ -80,7 +79,6 @@ buffer_zero_sse2(const void *buf, size_t len) /* Loop over 16-byte aligned blocks of 64. */ while (likely(p <= e)) { - __builtin_prefetch(p); t = _mm_cmpeq_epi8(t, zero); if (unlikely(_mm_movemask_epi8(t) != 0xFFFF)) { return false; @@ -111,7 +109,6 @@ buffer_zero_avx2(const void *buf, size_t len) /* Loop over 32-byte aligned blocks of 128. */ while (p <= e) { - __builtin_prefetch(p); if (unlikely(!_mm256_testz_si256(t, t))) { return false; } From patchwork Tue Apr 30 19:42:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 793368 Delivered-To: patch@linaro.org Received: by 2002:a05:6000:54f:b0:34d:5089:5a9e with SMTP id b15csp438022wrf; Tue, 30 Apr 2024 12:44:14 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUjCaOzxJm/238KYNLt2xt81jnPh12DxlvY7XqviP0cO+eclbLJOm6cMd5VDxcACbrtFsvcDcHcJPdvbiVG3jzY X-Google-Smtp-Source: AGHT+IEIKviQT0FG9KGTgOZZON6MMcoPEJyEDnnMTWaXwFPWyYWg/QOi4xMPnYg63xC6pnXcw4Jp X-Received: by 2002:a05:6214:2a8f:b0:6a0:7a3f:d290 with SMTP id jr15-20020a0562142a8f00b006a07a3fd290mr309730qvb.51.1714506253822; Tue, 30 Apr 2024 12:44:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714506253; cv=none; d=google.com; s=arc-20160816; b=Mt3hMBYJewHbbTtCM8tMKyXqnsfI4qkLnjiAZ4Oax/7oINMVZvdIJSrJziRcTaioTu CgCE21mw8LqTaL3deGBx4sDP4Wen64jb7HbBa+gG7ut7iDwhHOB1shbx7wgi7tbBsafH uWeNmHthiQaEj+WjFFFA2wWjRK77Bbb+hKAzkM/S3pnGsWX5jzwdJgxCa5nuSLR3Orw8 1WWDPI+yscS/oUnQcQ1CzOIlviaVeoEZgqAAz2m9ny4ka1HD+ALmb2CBtSo1MjlFyBYF JPW9wEIiH3iO8HoEbOzM98lfzYu4AMapl3oMxgteSx5i8gkcBXhQDqFegUsr9qgVmQlP fO1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=LJEBA8HhooZ+SwbOdLUPlZU44jMDHr0d5xq5JBRqrDA=; fh=QtYERfLY+/6lXIVoMxqjAcWASzkMPfeQc3Ur4v8xQcU=; b=boS9fdk04DWcZwHscVaNzRq+V1/hpEZE3G2+XV6horWWFYnUAXmwzD3RKVxJbp84LA t2GdEL1fVvYLe9fAikAIWJi1J/leKrPlLXSwNL4uaaGJbPHHGwjUR7hkGgDda68QQDnh oDSMWcPgMbuUuVkOGjT+HYrN5Vxhy9U11/2rwdCfGRp5sSIqs7zxtZXrKf9ZSVFBbfrn rUGm19XpF0zexKhQG/y3Ca5nxhmNrv5Xctz2gxW1unO1tWBMmNTINoWbSaZ92F/cvI4i 7p11VvlZpUE/oF/DeY4D1fqVrqGVJYugm2+dPRnGNnpqo7HPdCLyH4kvTsdG8SBCL46y qwmQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=gSMClsFR; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id o11-20020a0562140e4b00b006a0bf9c8841si7763971qvc.528.2024.04.30.12.44.13 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Tue, 30 Apr 2024 12:44:13 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=gSMClsFR; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s1tNa-00073q-27; Tue, 30 Apr 2024 15:43:06 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s1tNX-00072a-Kp for qemu-devel@nongnu.org; Tue, 30 Apr 2024 15:43:03 -0400 Received: from mail-pf1-x42e.google.com ([2607:f8b0:4864:20::42e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s1tNV-0006kq-7w for qemu-devel@nongnu.org; Tue, 30 Apr 2024 15:43:03 -0400 Received: by mail-pf1-x42e.google.com with SMTP id d2e1a72fcca58-6f4081574d6so2246626b3a.2 for ; Tue, 30 Apr 2024 12:43:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1714506180; x=1715110980; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=LJEBA8HhooZ+SwbOdLUPlZU44jMDHr0d5xq5JBRqrDA=; b=gSMClsFRSR5xf9U9wbcf9robQBXxUisFTLnoRUnOOt3MlWjKNzjegYqPjjdmk96HFw iG4etmq9rOIWIG2Sj8y9Mk0C3cAY0Q2zF5boWZ/JuDUYudNqtIud4OAQ7kiSBWNTaFXP Z82aV1hPxVzDweHd0oblsDUPH9wayou7AAERfMQTV6EJsN8QjJzMqd663pn5LCM0XLfH qTOwUEzg8ALaopowl35TLMqruDbL2gDt6R9Z8c/rL3BQUcywIEkAWBQEW4EUoqqI0Oq9 XEITEpNagPfqi9253lmVF+9xl4e0EK7sdQcpGo/pWzHLFExRhCs08TggyPu/XKRunRXD K0qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714506180; x=1715110980; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LJEBA8HhooZ+SwbOdLUPlZU44jMDHr0d5xq5JBRqrDA=; b=glh65GU01lSSzgHm5aHAON8iw6mEVWmA5cSSAqNDJdVrFQtHHzIdVttwcL/Vhb/YlR kmpDxhVj8IjcaK4q2VORI3DDtxFl8xRWDR7TNIrDmvf2Kydvl/mkjC/pk73g2jxcnqfx Hzp1y423amWbUUx5uKbW/2dcG2GEgAhKbb4uIVgSgwDCGF1anxv4vjMYjMnTHPyton76 ebcdTPaovCpQNeIvFviI/nFi5cufI306aqq9FnAtlTZgXEEP/cVZ9UePYva6k6P46Tvw TCwgBfldGC07grvoqskwaPO0JnWXp6Iy0K9wV1bJOmk8SFIdMA0OIyDqSywxRSTB4Xyv jZSw== X-Gm-Message-State: AOJu0YwHwP0Qj67qFjQXBHP6W+DIus66o8REaO1ECp1nv4IuuJHsFpII VfIre9v+DbiepvgQImHmsYAZmcllBjAIQmN9DHTV60UMd27Ba3i2yXtD+yOeCgHDm+FNHNlqof5 l X-Received: by 2002:a05:6a00:842:b0:6f3:f062:c09b with SMTP id q2-20020a056a00084200b006f3f062c09bmr705486pfk.6.1714506179829; Tue, 30 Apr 2024 12:42:59 -0700 (PDT) Received: from stoup.. (174-21-72-5.tukw.qwest.net. [174.21.72.5]) by smtp.gmail.com with ESMTPSA id d12-20020a63360c000000b005d880b41598sm20861523pga.94.2024.04.30.12.42.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Apr 2024 12:42:59 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: philmd@linaro.org, Alexander Monakov , Mikhail Romanov Subject: [PATCH v7 05/10] util/bufferiszero: Optimize SSE2 and AVX2 variants Date: Tue, 30 Apr 2024 12:42:48 -0700 Message-Id: <20240430194253.904768-6-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240430194253.904768-1-richard.henderson@linaro.org> References: <20240430194253.904768-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::42e; envelope-from=richard.henderson@linaro.org; helo=mail-pf1-x42e.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov Increase unroll factor in SIMD loops from 4x to 8x in order to move their bottlenecks from ALU port contention to load issue rate (two loads per cycle on popular x86 implementations). Avoid using out-of-bounds pointers in loop boundary conditions. Follow SSE2 implementation strategy in the AVX2 variant. Avoid use of PTEST, which is not profitable there (like in the removed SSE4 variant). Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-6-amonakov@ispras.ru> --- util/bufferiszero.c | 111 +++++++++++++++++++++++++++++--------------- 1 file changed, 73 insertions(+), 38 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 00118d649e..02df82b4ff 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -67,62 +67,97 @@ static bool buffer_is_zero_integer(const void *buf, size_t len) #if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) #include -/* Note that each of these vectorized functions require len >= 64. */ +/* Helper for preventing the compiler from reassociating + chains of binary vector operations. */ +#define SSE_REASSOC_BARRIER(vec0, vec1) asm("" : "+x"(vec0), "+x"(vec1)) + +/* Note that these vectorized functions may assume len >= 256. */ static bool __attribute__((target("sse2"))) buffer_zero_sse2(const void *buf, size_t len) { - __m128i t = _mm_loadu_si128(buf); - __m128i *p = (__m128i *)(((uintptr_t)buf + 5 * 16) & -16); - __m128i *e = (__m128i *)(((uintptr_t)buf + len) & -16); - __m128i zero = _mm_setzero_si128(); + /* Unaligned loads at head/tail. */ + __m128i v = *(__m128i_u *)(buf); + __m128i w = *(__m128i_u *)(buf + len - 16); + /* Align head/tail to 16-byte boundaries. */ + const __m128i *p = QEMU_ALIGN_PTR_DOWN(buf + 16, 16); + const __m128i *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 16); + __m128i zero = { 0 }; - /* Loop over 16-byte aligned blocks of 64. */ - while (likely(p <= e)) { - t = _mm_cmpeq_epi8(t, zero); - if (unlikely(_mm_movemask_epi8(t) != 0xFFFF)) { + /* Collect a partial block at tail end. */ + v |= e[-1]; w |= e[-2]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-3]; w |= e[-4]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-5]; w |= e[-6]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-7]; v |= w; + + /* + * Loop over complete 128-byte blocks. + * With the head and tail removed, e - p >= 14, so the loop + * must iterate at least once. + */ + do { + v = _mm_cmpeq_epi8(v, zero); + if (unlikely(_mm_movemask_epi8(v) != 0xFFFF)) { return false; } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } + v = p[0]; w = p[1]; + SSE_REASSOC_BARRIER(v, w); + v |= p[2]; w |= p[3]; + SSE_REASSOC_BARRIER(v, w); + v |= p[4]; w |= p[5]; + SSE_REASSOC_BARRIER(v, w); + v |= p[6]; w |= p[7]; + SSE_REASSOC_BARRIER(v, w); + v |= w; + p += 8; + } while (p < e - 7); - /* Finish the aligned tail. */ - t |= e[-3]; - t |= e[-2]; - t |= e[-1]; - - /* Finish the unaligned tail. */ - t |= _mm_loadu_si128(buf + len - 16); - - return _mm_movemask_epi8(_mm_cmpeq_epi8(t, zero)) == 0xFFFF; + return _mm_movemask_epi8(_mm_cmpeq_epi8(v, zero)) == 0xFFFF; } #ifdef CONFIG_AVX2_OPT static bool __attribute__((target("avx2"))) buffer_zero_avx2(const void *buf, size_t len) { - /* Begin with an unaligned head of 32 bytes. */ - __m256i t = _mm256_loadu_si256(buf); - __m256i *p = (__m256i *)(((uintptr_t)buf + 5 * 32) & -32); - __m256i *e = (__m256i *)(((uintptr_t)buf + len) & -32); + /* Unaligned loads at head/tail. */ + __m256i v = *(__m256i_u *)(buf); + __m256i w = *(__m256i_u *)(buf + len - 32); + /* Align head/tail to 32-byte boundaries. */ + const __m256i *p = QEMU_ALIGN_PTR_DOWN(buf + 32, 32); + const __m256i *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 32); + __m256i zero = { 0 }; - /* Loop over 32-byte aligned blocks of 128. */ - while (p <= e) { - if (unlikely(!_mm256_testz_si256(t, t))) { + /* Collect a partial block at tail end. */ + v |= e[-1]; w |= e[-2]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-3]; w |= e[-4]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-5]; w |= e[-6]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-7]; v |= w; + + /* Loop over complete 256-byte blocks. */ + for (; p < e - 7; p += 8) { + /* PTEST is not profitable here. */ + v = _mm256_cmpeq_epi8(v, zero); + if (unlikely(_mm256_movemask_epi8(v) != 0xFFFFFFFF)) { return false; } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } ; + v = p[0]; w = p[1]; + SSE_REASSOC_BARRIER(v, w); + v |= p[2]; w |= p[3]; + SSE_REASSOC_BARRIER(v, w); + v |= p[4]; w |= p[5]; + SSE_REASSOC_BARRIER(v, w); + v |= p[6]; w |= p[7]; + SSE_REASSOC_BARRIER(v, w); + v |= w; + } - /* Finish the last block of 128 unaligned. */ - t |= _mm256_loadu_si256(buf + len - 4 * 32); - t |= _mm256_loadu_si256(buf + len - 3 * 32); - t |= _mm256_loadu_si256(buf + len - 2 * 32); - t |= _mm256_loadu_si256(buf + len - 1 * 32); - - return _mm256_testz_si256(t, t); + return _mm256_movemask_epi8(_mm256_cmpeq_epi8(v, zero)) == 0xFFFFFFFF; } #endif /* CONFIG_AVX2_OPT */ From patchwork Tue Apr 30 19:42:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 793364 Delivered-To: patch@linaro.org Received: by 2002:a05:6000:54f:b0:34d:5089:5a9e with SMTP id b15csp437940wrf; Tue, 30 Apr 2024 12:44:01 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUkc1Fpez52FRj4BikpwWvDD+11vb8qUBJv5aySPVH85zScbg0Y8/6FHCTPkRqp+BMKEfEiGsQJsUYTTyd0BQ6T X-Google-Smtp-Source: AGHT+IGpwShShN/AsUUDhIJOZsAPXf7BQF3+f1728+KxUK7CzAF1WDnF6fWI3mfzfCG7Azum8ABV X-Received: by 2002:a05:622a:1786:b0:43a:bcd7:9898 with SMTP id s6-20020a05622a178600b0043abcd79898mr356243qtk.5.1714506241589; Tue, 30 Apr 2024 12:44:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714506241; cv=none; d=google.com; s=arc-20160816; b=f7lfwCSkJBgydouov55DDstFNYoEcLiKFjp9dpKlLkE/uvCPCvQ41tyDY0MiQyUTQa jqgx7GPfk6fJKUzPvcR2cAM51b6OuIVE2S0Y84c+1Wo+SYNuoeK1QuHuDF9/Cd/dNo1C 4VBQUlMifVBvbO5t5fBo1Z95EYjx9iwxyZeciyYJubDgc8VKIMlyHioh3Zkno9XCrkHN lKehuFsphcJADjysZfTstwjmKQ0/Frsmw0sht29Eo1CtB1gVRNssn1hVq6EZFasDyC23 cksxH9uepH39X59pqFS0GwjBOGSn4dLsN9nKX5WdtbD8CP1bAQTuX+Bvbliiq/luAmwa xWDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=0Qmmrii1MrzzFxsfdpzzNrItPzYs13dw3hkqKeZdH+k=; fh=ycigHkwOPieh2u5PmcA89sNk/qPD1ZkdDlL8Mchfvw8=; b=LB0XJPyrhn341/peFfP8snBi/qIY/eCCxT5eB4rIIoS9tJbEOe/A5l6LYnVSFm57FB XXw6ET881tscCxaDQY00bzDEdf5EfmXvFovb6wJKF/1gnpZS10kqagyuy2cBP3x3hnP6 bPecK7TnUy2DhYn+cZTILV08ODGG4XcA8vCQpdKynuVKSoh87GBTzN3MiHPqcPE0IOFs gjYpvz9ksW3jkXMx+MDtYq+OJV4Z4I0s2uOj+ZTsEtUalNiXlLNCVxdHjv4WvXtychje CW4bVNzKNn22qjTN07fynJIq5h6AVs044mVyLx4J/2btxiOpuNezJfc6rO8GsucyooK1 RzQA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=cy4zmznJ; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id bb1-20020a05622a1b0100b0043aef784c17si5016396qtb.697.2024.04.30.12.44.01 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Tue, 30 Apr 2024 12:44:01 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=cy4zmznJ; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s1tNa-00073I-0Q; Tue, 30 Apr 2024 15:43:06 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s1tNY-00072q-1R for qemu-devel@nongnu.org; Tue, 30 Apr 2024 15:43:04 -0400 Received: from mail-pf1-x42e.google.com ([2607:f8b0:4864:20::42e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s1tNW-0006ku-7B for qemu-devel@nongnu.org; Tue, 30 Apr 2024 15:43:03 -0400 Received: by mail-pf1-x42e.google.com with SMTP id d2e1a72fcca58-6f3f6aa1437so2738054b3a.3 for ; Tue, 30 Apr 2024 12:43:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1714506181; x=1715110981; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0Qmmrii1MrzzFxsfdpzzNrItPzYs13dw3hkqKeZdH+k=; b=cy4zmznJAcao1kBq08IuJhK25cncXnPZWLj/14XHPtz/YO1J8pP4rNt5JQ5FalpaWp 1Xh15Ru4N/ZT9bSd4AnX2gOknlthWo3kdqpBs9dRaxl0IZ1oXmXYg1DNS8VUylT1oV3i TknEjBvmemhaEGEnoVwo3J6kbuS8RryUy4yY3CCRFUBCn9jds9dDxOJUU+vUkJ5p3sNl ftVfbMKSPmx5eP/3A13hVMWNGU+TM15ik4giuZTJ1iTDIus8KnK6m4jKcMZcwBLanfwy qgtfhVbpo0u1x7e52XIWlpoc+/SxFtHNvl4akNQsqTyNnAduH1B0O9vBxy/0or1iT9gZ 9byw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714506181; x=1715110981; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0Qmmrii1MrzzFxsfdpzzNrItPzYs13dw3hkqKeZdH+k=; b=FgojV91O4PaH87xGPapPeS0dMsk05g64Wx8mE6bn3BJaVF2kUonLl08SiWx0t4AXxn NVp7H5Asrivg446dRSqjY3aKyCeNvr04em1BnyTm9x7D11zlayXMsvDzAnXwSrxaZbFv eRIWPyYNGrtjsmm+dfCOZ2SYnRPd6QvsnpfLD7ZIH0oI9smaZ2j3IhDYMn26LjTOo+bq kcdowNxS7F2THUe7UN9MApPZ9SxbRsfnbkBb8VmpEfMI0E3iPWhm3SO97tsj6sLppzy2 okiRV9Gt92mNxHKMGTerLC1g1GUl6GpPt92f2tZ/Szso+LEAsFD0qGhOBogH17kbEKkB QISA== X-Gm-Message-State: AOJu0YztbZnbfZ0dEe/X2gGtLH643pSRIpU7GxnLFqNIWUSqTC5keFC7 DefN8jMvHJ3A7Z6By1SwRh9vLbfSBsLeDuHtnMVN80jHTKrtvf+cJ2BBIqlwGGAXD9L7Al3Szhn h X-Received: by 2002:a05:6a00:3d41:b0:6f3:ea4b:d24d with SMTP id lp1-20020a056a003d4100b006f3ea4bd24dmr710186pfb.9.1714506180658; Tue, 30 Apr 2024 12:43:00 -0700 (PDT) Received: from stoup.. (174-21-72-5.tukw.qwest.net. [174.21.72.5]) by smtp.gmail.com with ESMTPSA id d12-20020a63360c000000b005d880b41598sm20861523pga.94.2024.04.30.12.43.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Apr 2024 12:43:00 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: philmd@linaro.org Subject: [PATCH v7 06/10] util/bufferiszero: Improve scalar variant Date: Tue, 30 Apr 2024 12:42:49 -0700 Message-Id: <20240430194253.904768-7-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240430194253.904768-1-richard.henderson@linaro.org> References: <20240430194253.904768-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::42e; envelope-from=richard.henderson@linaro.org; helo=mail-pf1-x42e.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Split less-than and greater-than 256 cases. Use unaligned accesses for head and tail. Avoid using out-of-bounds pointers in loop boundary conditions. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- util/bufferiszero.c | 85 +++++++++++++++++++++++++++------------------ 1 file changed, 51 insertions(+), 34 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 02df82b4ff..c9a7ded016 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -28,40 +28,57 @@ static bool (*buffer_is_zero_accel)(const void *, size_t); -static bool buffer_is_zero_integer(const void *buf, size_t len) +static bool buffer_is_zero_int_lt256(const void *buf, size_t len) { - if (unlikely(len < 8)) { - /* For a very small buffer, simply accumulate all the bytes. */ - const unsigned char *p = buf; - const unsigned char *e = buf + len; - unsigned char t = 0; + uint64_t t; + const uint64_t *p, *e; - do { - t |= *p++; - } while (p < e); - - return t == 0; - } else { - /* Otherwise, use the unaligned memory access functions to - handle the beginning and end of the buffer, with a couple - of loops handling the middle aligned section. */ - uint64_t t = ldq_he_p(buf); - const uint64_t *p = (uint64_t *)(((uintptr_t)buf + 8) & -8); - const uint64_t *e = (uint64_t *)(((uintptr_t)buf + len) & -8); - - for (; p + 8 <= e; p += 8) { - if (t) { - return false; - } - t = p[0] | p[1] | p[2] | p[3] | p[4] | p[5] | p[6] | p[7]; - } - while (p < e) { - t |= *p++; - } - t |= ldq_he_p(buf + len - 8); - - return t == 0; + /* + * Use unaligned memory access functions to handle + * the beginning and end of the buffer. + */ + if (unlikely(len <= 8)) { + return (ldl_he_p(buf) | ldl_he_p(buf + len - 4)) == 0; } + + t = ldq_he_p(buf) | ldq_he_p(buf + len - 8); + p = QEMU_ALIGN_PTR_DOWN(buf + 8, 8); + e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 8); + + /* Read 0 to 31 aligned words from the middle. */ + while (p < e) { + t |= *p++; + } + return t == 0; +} + +static bool buffer_is_zero_int_ge256(const void *buf, size_t len) +{ + /* + * Use unaligned memory access functions to handle + * the beginning and end of the buffer. + */ + uint64_t t = ldq_he_p(buf) | ldq_he_p(buf + len - 8); + const uint64_t *p = QEMU_ALIGN_PTR_DOWN(buf + 8, 8); + const uint64_t *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 8); + + /* Collect a partial block at the tail end. */ + t |= e[-7] | e[-6] | e[-5] | e[-4] | e[-3] | e[-2] | e[-1]; + + /* + * Loop over 64 byte blocks. + * With the head and tail removed, e - p >= 30, + * so the loop must iterate at least 3 times. + */ + do { + if (t) { + return false; + } + t = p[0] | p[1] | p[2] | p[3] | p[4] | p[5] | p[6] | p[7]; + p += 8; + } while (p < e - 7); + + return t == 0; } #if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) @@ -173,7 +190,7 @@ select_accel_cpuinfo(unsigned info) { CPUINFO_AVX2, buffer_zero_avx2 }, #endif { CPUINFO_SSE2, buffer_zero_sse2 }, - { CPUINFO_ALWAYS, buffer_is_zero_integer }, + { CPUINFO_ALWAYS, buffer_is_zero_int_ge256 }, }; for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { @@ -211,7 +228,7 @@ bool test_buffer_is_zero_next_accel(void) return false; } -#define INIT_ACCEL buffer_is_zero_integer +#define INIT_ACCEL buffer_is_zero_int_ge256 #endif static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL; @@ -232,7 +249,7 @@ bool buffer_is_zero_ool(const void *buf, size_t len) if (likely(len >= 256)) { return buffer_is_zero_accel(buf, len); } - return buffer_is_zero_integer(buf, len); + return buffer_is_zero_int_lt256(buf, len); } bool buffer_is_zero_ge256(const void *buf, size_t len) From patchwork Tue Apr 30 19:42:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 793365 Delivered-To: patch@linaro.org Received: by 2002:a05:6000:54f:b0:34d:5089:5a9e with SMTP id b15csp437976wrf; Tue, 30 Apr 2024 12:44:07 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUag9O6zZbynyuVE20JBoESuefDeRKKFtfCFK3vBwmQZqyNZJ2ed3KbGSRxXQrkA/8S9OH8xoKVmyTDS0X8wi61 X-Google-Smtp-Source: AGHT+IFIQRv2E0npkNcWWcSA4JplYUIPdLGFk/ewineKovkWhrSTYjfpAdKA/UCaWOd79rP57fBY X-Received: by 2002:a05:6214:234d:b0:6a0:af07:1089 with SMTP id hu13-20020a056214234d00b006a0af071089mr431163qvb.55.1714506247018; Tue, 30 Apr 2024 12:44:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714506247; cv=none; d=google.com; s=arc-20160816; b=S9gvD65eoW/Zomw6mLuGWpcenpsg/lM6JAc5ku03w/w8ZeI3/d1ZTTICf1Ti0e9rtf BZXlTHR+9jwAwbL6Ht0XLCsnpBer3qDrFpJa0th2TFwQnUmqQAgmFfYpf83XiEjfmi9J 2FzBIm5PAgGpJHJtIXSsbJjmKV/TQSFrC4abrY5B/rUwJbARy7Pq1djD99ypE5ab2B2i oPTdzqm+RKo/uJUT/g+SXPzVxy0E4TYwcU25IYxWUjCtmsDQd3mUui4+STfDRABE5Lhj Gjz31C1rFjBt8w39hhhuU5npdYUMxx60ZqUJ7weXy5NLV7Y5DcrO4EV1yJHCRTUEgz8e lgZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=oO6lVL+LVK6xULV3eqZmGPXjUDThyU0i78qX8xOIR9w=; fh=ycigHkwOPieh2u5PmcA89sNk/qPD1ZkdDlL8Mchfvw8=; b=YNrf3O0vbgDRZchxVf6xhyNuuz/UxxOGVYFWjfEPlMndx+xQ6YJ0KKljtiRN8ul8RJ Q4IPVhJhQcX/VxpHFe7LGgOKLiGsUDPt20PSrl+EQiqUxLnzVyxy5SAhl04tXSUBUEPf mJc9xcSBo8N4R6D95YrKLRdWE187mhocnstta0CAuRqgcxkgLG4aXWuftydc0bxy5G08 l+SEbyogjH8ZNMisKmP3YcuU9thXOSWIEm/7wc+2S+GCjDVyPxwbc1KCSKk0fhxvb2V0 8bebGa4VO3eXJLg40COqVQmczaeMhhg0rFB1Xn0y/rXkovKlqXodjbbgbko6EZ4XP+NU q4vw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=IFcYC42u; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id c2-20020ad45ae2000000b0069b3ca5bd2esi30596854qvh.603.2024.04.30.12.44.06 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Tue, 30 Apr 2024 12:44:06 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=IFcYC42u; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s1tNc-00075A-Rc; Tue, 30 Apr 2024 15:43:08 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s1tNa-00073u-6B for qemu-devel@nongnu.org; Tue, 30 Apr 2024 15:43:06 -0400 Received: from mail-io1-xd32.google.com ([2607:f8b0:4864:20::d32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s1tNX-0006kz-KT for qemu-devel@nongnu.org; Tue, 30 Apr 2024 15:43:05 -0400 Received: by mail-io1-xd32.google.com with SMTP id ca18e2360f4ac-7dedc577011so62704739f.3 for ; Tue, 30 Apr 2024 12:43:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1714506181; x=1715110981; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=oO6lVL+LVK6xULV3eqZmGPXjUDThyU0i78qX8xOIR9w=; b=IFcYC42udXyZLO8cURfsKnbyrTtAgg5E9OTLzTxaR7EOhJKmprgazy0APBJ4PE2Isi lryanpzj9+nWFsPec9Ajv9Bb41TBVJ1EihbvV4naQtR6Dyd4FMp8WUJXmPCgX9b6vL6c Q9s9s/frq9lsmyCCzyyIqMq5QVAH+3eZy1E3VnEg+hoN9dIicAMnibuxt4zwvADAZJJS WK884PpxZdKhZ6XGkwNVNwnnVbvW+qis1DD0G1go/usgpF75+eIRKzIkWbSJyE/mEdLG TFA7Enw15K8rMYQQX+nKPr9MuUVlCPwDRYmEwjN9Tv4P4lqKMgRy3cW59mqKd1hZ3wFB DmrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714506181; x=1715110981; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oO6lVL+LVK6xULV3eqZmGPXjUDThyU0i78qX8xOIR9w=; b=mSBLAAerG3t57bJ5Ai3QD4R7PIBheTBz0YKaZpVNJA49PpmOIcltzAgSWPBIEUY0oR iIiYzTujVAI1pZBFuO9kRvXUOctox4mBAKZfTICS2mVNouzC5dMS5fjvbqcQDzjaYH+b 1wSidEbekLPfRB1ewl7m1fb91PsZUhuo6HAgMo0Vs3HIZGHMinMGPGfjUWbwMFTgdVWB +/eGoEk48ht/po/QdgNJ97LInel/132oGrtK0Z89oQlXYTgYUBdrfm2uvmvClTFSy16e o4E71YTGXQzepHgYFvPV0L96EF5PZRxAyR4Ag8C8yd8utHpegcNa/tQ5JhE1bMPMUqx7 a4gQ== X-Gm-Message-State: AOJu0YzohwT5Qr5euf6ZjOHY+1xaiAilxTOLtCZvKXVhXJ/PaTKegDWl fcMmfmVdMht4iOnZsFbobQNNY1jc56CCaVqyhhYvkCqy78MRW6LATPEJuknpapP9ci+Zfc/lok3 1 X-Received: by 2002:a05:6e02:19c7:b0:36a:2a57:9393 with SMTP id r7-20020a056e0219c700b0036a2a579393mr950428ill.3.1714506181328; Tue, 30 Apr 2024 12:43:01 -0700 (PDT) Received: from stoup.. (174-21-72-5.tukw.qwest.net. [174.21.72.5]) by smtp.gmail.com with ESMTPSA id d12-20020a63360c000000b005d880b41598sm20861523pga.94.2024.04.30.12.43.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Apr 2024 12:43:00 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: philmd@linaro.org Subject: [PATCH v7 07/10] util/bufferiszero: Introduce biz_accel_fn typedef Date: Tue, 30 Apr 2024 12:42:50 -0700 Message-Id: <20240430194253.904768-8-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240430194253.904768-1-richard.henderson@linaro.org> References: <20240430194253.904768-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::d32; envelope-from=richard.henderson@linaro.org; helo=mail-io1-xd32.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- util/bufferiszero.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index c9a7ded016..f9af7841ba 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -26,7 +26,8 @@ #include "qemu/bswap.h" #include "host/cpuinfo.h" -static bool (*buffer_is_zero_accel)(const void *, size_t); +typedef bool (*biz_accel_fn)(const void *, size_t); +static biz_accel_fn buffer_is_zero_accel; static bool buffer_is_zero_int_lt256(const void *buf, size_t len) { @@ -184,7 +185,7 @@ select_accel_cpuinfo(unsigned info) /* Array is sorted in order of algorithm preference. */ static const struct { unsigned bit; - bool (*fn)(const void *, size_t); + biz_accel_fn fn; } all[] = { #ifdef CONFIG_AVX2_OPT { CPUINFO_AVX2, buffer_zero_avx2 }, @@ -231,7 +232,7 @@ bool test_buffer_is_zero_next_accel(void) #define INIT_ACCEL buffer_is_zero_int_ge256 #endif -static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL; +static biz_accel_fn buffer_is_zero_accel = INIT_ACCEL; bool buffer_is_zero_ool(const void *buf, size_t len) { From patchwork Tue Apr 30 19:42:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 793371 Delivered-To: patch@linaro.org Received: by 2002:a05:6000:54f:b0:34d:5089:5a9e with SMTP id b15csp438119wrf; Tue, 30 Apr 2024 12:44:33 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWt1UT/ZUM+KmC4Fs3Z6lhJdyg/yQX8y25eKWvI54/u+1V1CZv4tav09uwfWafOVMVdlnIcJNHgkDmT0Slcrc8p X-Google-Smtp-Source: AGHT+IE0936CKaWKLAPPXNqvDKNoU/zjjZ0OMyERAzdUsl2dMQN6/TaBRB7eF8alwilM92t+5U7O X-Received: by 2002:a25:ae1f:0:b0:dcd:a9ad:7d67 with SMTP id a31-20020a25ae1f000000b00dcda9ad7d67mr709000ybj.8.1714506273439; Tue, 30 Apr 2024 12:44:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714506273; cv=none; d=google.com; s=arc-20160816; b=B/dy3qq8j/sFJIYVyyAfsGGgWCXMjzJZIias0powr7FpM05Bc6TJ6eeVMM9diV9ELl XbHGp66DGCET1Jpr2IFN3trsVk7B03EkrYN4ALHuFANEwjgdPRbNLcA78QQkGiSFj+Vo 1mrYz4yMJaiOdRqiER1vJ8P8gyAhSA0REDPXi7s7lyzBNCser40a9edj7WV5EH5XXQUb ILq2Kjg/EgsqaNaYWJlj8vu6ktJooTjC2ct559/oEuOkL5TIG8E7pBbnNAOKgxE0Ki6W vbJOv78QVH1PF4qVZG/QeN8bV/a18Dd6yI3pirYCu3GqK77BlDnuJzyPph4OpWvPRGEo QeXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=xZeDc+rGPQ24e0m555YcdhuR3YwVPpI0p6P9+s6hA88=; fh=ycigHkwOPieh2u5PmcA89sNk/qPD1ZkdDlL8Mchfvw8=; b=RhQwoYR0eiRCdHS4JB8BY4T9G68tFpqva0Hm7/DmecEGXB3DbS9jgx/AneednbrXPs oNY4ZSm5cXs+xrR0eFrtBxyyiUVC7TdjNrA9THjmt7Fq5vqTe0qzq9kgjX73GP6qDZSP 5FGJUGG39bIi3rVfXm38rB2M/2+xQdD4SkyCmK8VmYdqZUs7pwMVU5pMFNb2bbowESou ewYXQfOtdhsWKV9O9hhI2Ei6R3vkOtbW2TnzPqFs7U+LDZLPMmYqvJOSmveguG/rpy1/ Jn/FKoyBZNt+8Y/Ehc+cwtzLOh4OZ7y/+liCRuVBnn+BjH8TvH3Qjw3tnBlU5emByNXM 9cdA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="C/g84Sjj"; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id n16-20020a05622a11d000b0043ab23435f2si8174619qtk.149.2024.04.30.12.44.33 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Tue, 30 Apr 2024 12:44:33 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="C/g84Sjj"; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s1tNb-00074c-KL; Tue, 30 Apr 2024 15:43:07 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s1tNa-00073p-0Y for qemu-devel@nongnu.org; Tue, 30 Apr 2024 15:43:06 -0400 Received: from mail-oa1-x2b.google.com ([2001:4860:4864:20::2b]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s1tNX-0006lC-DN for qemu-devel@nongnu.org; Tue, 30 Apr 2024 15:43:04 -0400 Received: by mail-oa1-x2b.google.com with SMTP id 586e51a60fabf-234db9dde9bso2541710fac.1 for ; Tue, 30 Apr 2024 12:43:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1714506182; x=1715110982; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xZeDc+rGPQ24e0m555YcdhuR3YwVPpI0p6P9+s6hA88=; b=C/g84SjjRBUlvu6UltOh8AGqIpezWPyrh8XldhnZh5bHOmZkib9fstVJp3xfdZWSPj Fxb38lj3vxHesNT1TZt68lsc7jkyy9xuMQvLLYHfchReWzV2CLP/vugGgwXpiwpJIiRC 94/nWQ1ZZM50R/EgG+KLPSzIykk3+Cx1XgXS8qjl2MdvSViXxGaflAbY/mrdiCvJo1Kt qezumWvMAcRBUD5MmMBaBLWQfRhORle+r52KqYQFU0+/N3er2HXEVfnTDRGP3nHEAybJ VUq6CBuPPRl11Yo74bvYk6NFEJxjZ5wMlAJFSCPJr67n0WOIxgPLrwdr9IoppCa2q4qd dOkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714506182; x=1715110982; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xZeDc+rGPQ24e0m555YcdhuR3YwVPpI0p6P9+s6hA88=; b=fZVirYqv05Br5lh2IvsVpP3VGIXy528Ved7HLl2wdmJqzNuVsyltfOzPLXptqWYJbt cvyLefOzft756o70GgfN4A5Oav/cLWj71elKowvGrKKXmOnntKPGawA8cQOPcAuJ0+9X DSIORiis5fSh8/sivZfiKLmH/lv7oZYf0A1VF6yy5Pq/04IgY95wWMDiDVmHYM0G8v/R vi6vsrhJW6n8VWqn5qkzJs6cYBEuNjolfMGELV4s8JSI2/00CCeg56Xg5bECBeTi+tFE HxRUXTEs5rmOCycO2A/KaLZ2AQk+aHtHJumwjLO2mG+kAqxxnlpBF27lw5oJTNVPmg1I QbDw== X-Gm-Message-State: AOJu0YwKy9PgRKs0emUyQYwlF75ycCN0851COE+ZHUj0IadFEYWw02qM SNQSdSXVDY5zCXsS66d2TcayPgevIUt57fUm2F1xUWrfHt5AVoDBucuiaE7xUXBnbf1y3wHeoNU L X-Received: by 2002:a05:6870:f22a:b0:220:873d:dbcc with SMTP id t42-20020a056870f22a00b00220873ddbccmr566264oao.49.1714506182119; Tue, 30 Apr 2024 12:43:02 -0700 (PDT) Received: from stoup.. (174-21-72-5.tukw.qwest.net. [174.21.72.5]) by smtp.gmail.com with ESMTPSA id d12-20020a63360c000000b005d880b41598sm20861523pga.94.2024.04.30.12.43.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Apr 2024 12:43:01 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: philmd@linaro.org Subject: [PATCH v7 08/10] util/bufferiszero: Simplify test_buffer_is_zero_next_accel Date: Tue, 30 Apr 2024 12:42:51 -0700 Message-Id: <20240430194253.904768-9-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240430194253.904768-1-richard.henderson@linaro.org> References: <20240430194253.904768-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2001:4860:4864:20::2b; envelope-from=richard.henderson@linaro.org; helo=mail-oa1-x2b.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Because the three alternatives are monotonic, we don't need to keep a couple of bitmasks, just identify the strongest alternative at startup. Generalize test_buffer_is_zero_next_accel and init_accel by always defining an accel_table array. Signed-off-by: Richard Henderson Reviewed-by: Philippe Mathieu-Daudé --- util/bufferiszero.c | 81 ++++++++++++++++++++------------------------- 1 file changed, 35 insertions(+), 46 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index f9af7841ba..7218154a13 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -27,7 +27,6 @@ #include "host/cpuinfo.h" typedef bool (*biz_accel_fn)(const void *, size_t); -static biz_accel_fn buffer_is_zero_accel; static bool buffer_is_zero_int_lt256(const void *buf, size_t len) { @@ -179,60 +178,35 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ -static unsigned __attribute__((noinline)) -select_accel_cpuinfo(unsigned info) -{ - /* Array is sorted in order of algorithm preference. */ - static const struct { - unsigned bit; - biz_accel_fn fn; - } all[] = { +static biz_accel_fn const accel_table[] = { + buffer_is_zero_int_ge256, + buffer_zero_sse2, #ifdef CONFIG_AVX2_OPT - { CPUINFO_AVX2, buffer_zero_avx2 }, + buffer_zero_avx2, #endif - { CPUINFO_SSE2, buffer_zero_sse2 }, - { CPUINFO_ALWAYS, buffer_is_zero_int_ge256 }, - }; +}; - for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { - if (info & all[i].bit) { - buffer_is_zero_accel = all[i].fn; - return all[i].bit; - } +static unsigned best_accel(void) +{ + unsigned info = cpuinfo_init(); + +#ifdef CONFIG_AVX2_OPT + if (info & CPUINFO_AVX2) { + return 2; } - return 0; +#endif + return info & CPUINFO_SSE2 ? 1 : 0; } -static unsigned used_accel; - -static void __attribute__((constructor)) init_accel(void) -{ - used_accel = select_accel_cpuinfo(cpuinfo_init()); -} - -#define INIT_ACCEL NULL - -bool test_buffer_is_zero_next_accel(void) -{ - /* - * Accumulate the accelerators that we've already tested, and - * remove them from the set to test this round. We'll get back - * a zero from select_accel_cpuinfo when there are no more. - */ - unsigned used = select_accel_cpuinfo(cpuinfo & ~used_accel); - used_accel |= used; - return used; -} #else -bool test_buffer_is_zero_next_accel(void) -{ - return false; -} - -#define INIT_ACCEL buffer_is_zero_int_ge256 +#define best_accel() 0 +static biz_accel_fn const accel_table[1] = { + buffer_is_zero_int_ge256 +}; #endif -static biz_accel_fn buffer_is_zero_accel = INIT_ACCEL; +static biz_accel_fn buffer_is_zero_accel; +static unsigned accel_index; bool buffer_is_zero_ool(const void *buf, size_t len) { @@ -257,3 +231,18 @@ bool buffer_is_zero_ge256(const void *buf, size_t len) { return buffer_is_zero_accel(buf, len); } + +bool test_buffer_is_zero_next_accel(void) +{ + if (accel_index != 0) { + buffer_is_zero_accel = accel_table[--accel_index]; + return true; + } + return false; +} + +static void __attribute__((constructor)) init_accel(void) +{ + accel_index = best_accel(); + buffer_is_zero_accel = accel_table[accel_index]; +} From patchwork Tue Apr 30 19:42:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 793370 Delivered-To: patch@linaro.org Received: by 2002:a05:6000:54f:b0:34d:5089:5a9e with SMTP id b15csp438076wrf; Tue, 30 Apr 2024 12:44:25 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWDMjbswndRX7Hbj0CIEzeNU9UmsevJbBI5WSoyMyZVCqaujv6A4KN7ZgKWAuFGYJkBmgBED97P3ASSm9RiUQjx X-Google-Smtp-Source: AGHT+IHuKMl42fWuTswGSmRG861bqRGRcY/T8ax1IUGZKsz30ddTGDgvMIhO6eP3PjmRr/uY93o1 X-Received: by 2002:a05:6102:241b:b0:47c:3592:f0aa with SMTP id j27-20020a056102241b00b0047c3592f0aamr745991vsi.30.1714506265070; Tue, 30 Apr 2024 12:44:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714506265; cv=none; d=google.com; s=arc-20160816; b=oX8P3/2o1FHo16NDBTydhq4yOclIQK5FSFE9KUQ3wdXP+AH3uC9O5YvK1toiOw9xuz ehjM/SJKosciwDr5XnZrTXXeyp06XLFq9c0sn876c5kUuQUkYVeCKrfvcMycyq7hJQJo r873tHfmXDbCWAZksQ8/MpRcxj753kuYwyR+TsB+XyXQVmiXwqjPTrThEf2/l6A2+oqm 8xJIFBv8BwHGaLQKVBmgiAmdIkbq4oyUDrTssLF8N+asmpci1HKVZ27+zFyg+cizV3yW qrqP7a7C+b1StsPi0o9bsuisO0N3WUtVwDnhmhNOWVO1Gt/xu97WmOcJh7ZXcEzENv6f Rt2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=GFUk4wDI0aciOtV8kFy/BP1LluK4qErzNDWj4G5aLoM=; fh=ycigHkwOPieh2u5PmcA89sNk/qPD1ZkdDlL8Mchfvw8=; b=cfqBUEhwSduTAjKJZc7/+ezsX1H+XtesiXMGqla3L4IXB94pjZZGRR2Gy2ZydXm/mA jn+DJB/RV5Ki8pzVPWlrUXJ9WnUIeic8RdXgdmHotw8JhmYzVW21rmiWMRyWLiRVvk1W p5tNu1IWcKWVT9gaho3rAFAntzcrWKJmXgI5cMTkhg1zdO58J4Aai3QUqjGbN8azX0Qb 6cQlhsqEpJR8CDYxXJSzUsOueVeyTR1Wdas1DxfZyr6ug+5sAValVi73PoOu0GI+8XHo CQBQbsxFbNYZeWQ/hSujBSNKxYi5NUOB/0hmucRaDI92P7Fan/PEHB6tKceyq4r+EBH6 wLjA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Mmn2zE3A; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id jo27-20020a056214501b00b006a0c7f42125si6766468qvb.58.2024.04.30.12.44.24 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Tue, 30 Apr 2024 12:44:25 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Mmn2zE3A; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s1tNb-00074e-O3; Tue, 30 Apr 2024 15:43:07 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s1tNa-00073z-9Y for qemu-devel@nongnu.org; Tue, 30 Apr 2024 15:43:06 -0400 Received: from mail-ot1-x32d.google.com ([2607:f8b0:4864:20::32d]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s1tNY-0006lW-Nq for qemu-devel@nongnu.org; Tue, 30 Apr 2024 15:43:06 -0400 Received: by mail-ot1-x32d.google.com with SMTP id 46e09a7af769-6ee3a49bdcfso1336456a34.0 for ; Tue, 30 Apr 2024 12:43:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1714506183; x=1715110983; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GFUk4wDI0aciOtV8kFy/BP1LluK4qErzNDWj4G5aLoM=; b=Mmn2zE3ApmmksEd2zSiThKstH6QIuFA7jA8QA3l8SwWcBX55TMpN/R7/p/hkqIdNPo PsKaCP0l1+UV7KJe94wcVJWcqwl+Ps1VtHNBQXYBRXNc8ul2l0VEcIgetcygigM8vBsO 3dStgv6hNqfYUSoxlVgpG97+vcqBDm+VS8EnSZrQZM6aeOMmNqawO86Bh5iuW6tdPb4A /8seYEU7gKjsCmIp7zynpVhLjLFWWYyDUcsqL/JVCiz5pM/7Yud5c0Ylz392GSO4QSUt WOb82SuEcJbPmqWx09l/icmSQVDv2edCPaLEFl11Ce6cd/kYFKD+CCOtSyPe9U0rhIIA bPfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714506183; x=1715110983; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GFUk4wDI0aciOtV8kFy/BP1LluK4qErzNDWj4G5aLoM=; b=JqrYVKCRsWHT2Nv6DWXk0uvFDD+cjiZ7A7A1DXwYpnEhhiHYAzSuirEGDgOQE0IKK7 Gf/ydCffzVMFlgmELQUV+nH7D70KOB+KF07muz4z1SneHtO61L5aP+iKjwRpeK93qJne GJK/utREhkmGaHlaRD5LSAyT71Y0UmQ97YOyQrRmd+yhDnsYY25Ud9VCXEfMPnImPj+J bt6fsBTc6AuLxumrW8eYVrCPOfbpcO3J5Zam1pb8mXcpKYOP4OVnLqB4J1qTE2PSuXqC ZVOsQBj8Vv1to43wPLPrMxEtBlFTwWqsg3Vw916rH4do1CIKTRNiwMtZaRzY2aSM5PAu 1Tfg== X-Gm-Message-State: AOJu0YxGi+j3yg9OPHiC2ESwXMUaZ6dosAywkugfj4TcvLkItLtdOOSc JfoeY11wchoOHMbAj0Z4QISkIEfXtj611jOYtHGVyoTDEolgfbDMyyRfSbcJbTUwNKSrIIExwMy B X-Received: by 2002:a05:6870:2481:b0:23c:471:a5d2 with SMTP id s1-20020a056870248100b0023c0471a5d2mr535258oaq.30.1714506183211; Tue, 30 Apr 2024 12:43:03 -0700 (PDT) Received: from stoup.. (174-21-72-5.tukw.qwest.net. [174.21.72.5]) by smtp.gmail.com with ESMTPSA id d12-20020a63360c000000b005d880b41598sm20861523pga.94.2024.04.30.12.43.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Apr 2024 12:43:02 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: philmd@linaro.org Subject: [PATCH v7 09/10] util/bufferiszero: Add simd acceleration for aarch64 Date: Tue, 30 Apr 2024 12:42:52 -0700 Message-Id: <20240430194253.904768-10-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240430194253.904768-1-richard.henderson@linaro.org> References: <20240430194253.904768-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::32d; envelope-from=richard.henderson@linaro.org; helo=mail-ot1-x32d.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely double-check with the compiler flags for __ARM_NEON and don't bother with a runtime check. Otherwise, model the loop after the x86 SSE2 function. Use UMAXV for the vector reduction. This is 3 cycles on cortex-a76 and 2 cycles on neoverse-n1. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- util/bufferiszero.c | 67 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 7218154a13..74864f7b78 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -198,6 +198,73 @@ static unsigned best_accel(void) return info & CPUINFO_SSE2 ? 1 : 0; } +#elif defined(__aarch64__) && defined(__ARM_NEON) +#include + +/* + * Helper for preventing the compiler from reassociating + * chains of binary vector operations. + */ +#define REASSOC_BARRIER(vec0, vec1) asm("" : "+w"(vec0), "+w"(vec1)) + +static bool buffer_is_zero_simd(const void *buf, size_t len) +{ + uint32x4_t t0, t1, t2, t3; + + /* Align head/tail to 16-byte boundaries. */ + const uint32x4_t *p = QEMU_ALIGN_PTR_DOWN(buf + 16, 16); + const uint32x4_t *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 16); + + /* Unaligned loads at head/tail. */ + t0 = vld1q_u32(buf) | vld1q_u32(buf + len - 16); + + /* Collect a partial block at tail end. */ + t1 = e[-7] | e[-6]; + t2 = e[-5] | e[-4]; + t3 = e[-3] | e[-2]; + t0 |= e[-1]; + REASSOC_BARRIER(t0, t1); + REASSOC_BARRIER(t2, t3); + t0 |= t1; + t2 |= t3; + REASSOC_BARRIER(t0, t2); + t0 |= t2; + + /* + * Loop over complete 128-byte blocks. + * With the head and tail removed, e - p >= 14, so the loop + * must iterate at least once. + */ + do { + /* + * Reduce via UMAXV. Whatever the actual result, + * it will only be zero if all input bytes are zero. + */ + if (unlikely(vmaxvq_u32(t0) != 0)) { + return false; + } + + t0 = p[0] | p[1]; + t1 = p[2] | p[3]; + t2 = p[4] | p[5]; + t3 = p[6] | p[7]; + REASSOC_BARRIER(t0, t1); + REASSOC_BARRIER(t2, t3); + t0 |= t1; + t2 |= t3; + REASSOC_BARRIER(t0, t2); + t0 |= t2; + p += 8; + } while (p < e - 7); + + return vmaxvq_u32(t0) == 0; +} + +#define best_accel() 1 +static biz_accel_fn const accel_table[] = { + buffer_is_zero_int_ge256, + buffer_is_zero_simd, +}; #else #define best_accel() 0 static biz_accel_fn const accel_table[1] = { From patchwork Tue Apr 30 19:42:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 793367 Delivered-To: patch@linaro.org Received: by 2002:a05:6000:54f:b0:34d:5089:5a9e with SMTP id b15csp438012wrf; Tue, 30 Apr 2024 12:44:12 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWT3fG8lIKaOzRB3G60jvAP8hnQs5piSlcSOvwM5yF/rAI19/p26gk/z+YYlGvpwv05egcwPIyJi5D7bc6Gga/O X-Google-Smtp-Source: AGHT+IGNP6G/O0Jj4Djv0nroZRAcYU9nYXp9sfCAdOxufYyqC2kiw6JYteN2q0i6easRjpn/LlTa X-Received: by 2002:a25:dc84:0:b0:de6:1643:f3cd with SMTP id y126-20020a25dc84000000b00de61643f3cdmr594230ybe.44.1714506252468; Tue, 30 Apr 2024 12:44:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714506252; cv=none; d=google.com; s=arc-20160816; b=O+LpikXTLKBPcIuxJqKkrM+C+OILs73B8v4gXC/etBM9PCc9tas5iBMsoOWQgJfZfJ DlfYnQivWb6L4kFxDGZHchz1UDAGUKZg2X/O/1PpGyg8RnQygyTr7jtn7yngBhI4L4t+ sdZEAV3qc2Dl7oo3FYh0KaTthQBrr5ltVOhUp8/5Tixk+Q25niAn11DqjwbqUbsjQf/c bjZyGO3wwF9cQP7LR392/T9Zcnpr6n6HL4NlzkvGwZM6pLSH2gcrCZDntZ9t6yOHEvZL 2idvx1VmUdwZvp1890a2ovniW6Nktn28Q6TMJl5j6prVailfu4somc+NPx9OTrQo/xBo BtCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=P8aiuTj5/6yjo0wKZf1j2mdDbeTtUm906YMvC+PIZPk=; fh=ycigHkwOPieh2u5PmcA89sNk/qPD1ZkdDlL8Mchfvw8=; b=kx+0nrnH7D2ByglXaipNldP5wzfxz9IvzPMj6pLFp/r7lXQJ9j0SO1N8FbtYWUOLC+ WNTKjGrIk4GLtWlvzGUWBfxaYoE2EcEzmIAtZ3N8TSkQHovARf3dmoVPLPvS2T+81bqC VijgHJkVBzYYqSjqk49voPOA8zdMXK+bLMYuGubdMH3rrn/iuaUsZTyyC6jLeFkuIdLj cQiuHzx/Y+A76H4dlE2G39eo48fAc3D6ryoiadQ2SoMqLJGT0/755ocD+NWzWHkCz0wx q0+wtdaKhxrDGDnHRSijDiBWwSFtCi5vs+VhkZjT/aXCx9z1AsvqnSKNi4/9DqyS4QID u+bw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=KWkETIoy; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id t4-20020a05622a180400b0043b08a90973si4046949qtc.107.2024.04.30.12.44.12 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Tue, 30 Apr 2024 12:44:12 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=KWkETIoy; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s1tNj-00075y-Nh; Tue, 30 Apr 2024 15:43:15 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s1tNi-00075j-2n for qemu-devel@nongnu.org; Tue, 30 Apr 2024 15:43:14 -0400 Received: from mail-oo1-xc32.google.com ([2607:f8b0:4864:20::c32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s1tNa-0006li-Ht for qemu-devel@nongnu.org; Tue, 30 Apr 2024 15:43:13 -0400 Received: by mail-oo1-xc32.google.com with SMTP id 006d021491bc7-5aa2551d33dso4121499eaf.0 for ; Tue, 30 Apr 2024 12:43:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1714506184; x=1715110984; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=P8aiuTj5/6yjo0wKZf1j2mdDbeTtUm906YMvC+PIZPk=; b=KWkETIoyPXdYR+JOUg8Dxlxk7DS5MqnBbiKGSAp9Ov0cyrPPYPsVdFiHQctGXwmZeW Y6mRaRfbUVQ+Ka5lUWjWaFrqM0DEWq8jzj9yU3CJhUlxpbzOuhJM+HN5uTdKaokwf2EG VIO+GEgDWPVWYv9Ylpc0255cLaXdvKB8STVLjZqMuACCNUxgWuKCtjX+2fQCxdWnGkMB QcMns2Ytk/AUouO49WHxPRS4aoAMNUsE7okWFf69Lqr/RDWferasLaKYRbcHNUQi0wXR iyMN7k8Q+lUNyrbgLZKS0pzFNaXHRNIrvPbettxoTJ0Q5BZMZNvA5W6W6FHj3d6RYktR H4Ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714506184; x=1715110984; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=P8aiuTj5/6yjo0wKZf1j2mdDbeTtUm906YMvC+PIZPk=; b=HW2QQx9l1x10Totkk6iNDPdCwhh1BVjLwMrI4r71oTc/SJpU4QWrr62aUvR8RqKBCU J0dyupfFnxAVFU6/hLXYv4XQ8DK4ZWR/Jwhx0vjzJxFygi9FNyOvuBUYUQEeuQtwjDXG a6whd5xYOF4c1IPxZZlsOohKEo2F28Jy4GFX4odVum3JdHhVW0G6wVE4hXHB9gMlxS2f XJX4OpGW3mseU4FzIcpsbhhLeFXAe7QEy8G6DLK++XvWgrAXAlbRlKjqNW+VjZ0sWPLI 6YyE8mRGXJqdmXe/yR86JlDUs/1T4n7A3FboAaKByDRY4WgiXlRUJ+hG2fnZwhbLReX/ Pt/g== X-Gm-Message-State: AOJu0YxsPdrIyhdFxxm75JnDUXqUZlzBxJ2Geg0RlVeiCLQjAfsHgEs1 FKuIoXZE7XO/iItsenHs0JK6XqnfNouxs8DqMF17Z44dUrdvhbeUuTIswLo551c0sywBVXCRJDn F X-Received: by 2002:a05:6358:988b:b0:17f:729a:8562 with SMTP id q11-20020a056358988b00b0017f729a8562mr391176rwa.3.1714506184111; Tue, 30 Apr 2024 12:43:04 -0700 (PDT) Received: from stoup.. (174-21-72-5.tukw.qwest.net. [174.21.72.5]) by smtp.gmail.com with ESMTPSA id d12-20020a63360c000000b005d880b41598sm20861523pga.94.2024.04.30.12.43.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Apr 2024 12:43:03 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: philmd@linaro.org Subject: [PATCH v7 10/10] tests/bench: Add bufferiszero-bench Date: Tue, 30 Apr 2024 12:42:53 -0700 Message-Id: <20240430194253.904768-11-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240430194253.904768-1-richard.henderson@linaro.org> References: <20240430194253.904768-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::c32; envelope-from=richard.henderson@linaro.org; helo=mail-oo1-xc32.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Benchmark each acceleration function vs an aligned buffer of zeros. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- tests/bench/bufferiszero-bench.c | 47 ++++++++++++++++++++++++++++++++ tests/bench/meson.build | 1 + 2 files changed, 48 insertions(+) create mode 100644 tests/bench/bufferiszero-bench.c diff --git a/tests/bench/bufferiszero-bench.c b/tests/bench/bufferiszero-bench.c new file mode 100644 index 0000000000..222695c1fa --- /dev/null +++ b/tests/bench/bufferiszero-bench.c @@ -0,0 +1,47 @@ +/* + * QEMU buffer_is_zero speed benchmark + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * (at your option) any later version. See the COPYING file in the + * top-level directory. + */ +#include "qemu/osdep.h" +#include "qemu/cutils.h" +#include "qemu/units.h" + +static void test(const void *opaque) +{ + size_t max = 64 * KiB; + void *buf = g_malloc0(max); + int accel_index = 0; + + do { + if (accel_index != 0) { + g_test_message("%s", ""); /* gnu_printf Werror for simple "" */ + } + for (size_t len = 1 * KiB; len <= max; len *= 4) { + double total = 0.0; + + g_test_timer_start(); + do { + buffer_is_zero_ge256(buf, len); + total += len; + } while (g_test_timer_elapsed() < 0.5); + + total /= MiB; + g_test_message("buffer_is_zero #%d: %2zuKB %8.0f MB/sec", + accel_index, len / (size_t)KiB, + total / g_test_timer_last()); + } + accel_index++; + } while (test_buffer_is_zero_next_accel()); + + g_free(buf); +} + +int main(int argc, char **argv) +{ + g_test_init(&argc, &argv, NULL); + g_test_add_data_func("/cutils/bufferiszero/speed", NULL, test); + return g_test_run(); +} diff --git a/tests/bench/meson.build b/tests/bench/meson.build index 7e76338a52..4cd7a2f6b5 100644 --- a/tests/bench/meson.build +++ b/tests/bench/meson.build @@ -21,6 +21,7 @@ benchs = {} if have_block benchs += { + 'bufferiszero-bench': [], 'benchmark-crypto-hash': [crypto], 'benchmark-crypto-hmac': [crypto], 'benchmark-crypto-cipher': [crypto],