From patchwork Fri May 3 15:13:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 794384 Delivered-To: patch@linaro.org Received: by 2002:adf:a153:0:b0:34d:5089:5a9e with SMTP id r19csp788728wrr; Fri, 3 May 2024 08:14:20 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUidOmCsa1Zim9LtxKn3G71sedl75A3cIxfzGEeBWsHb/hgnuKVCyL37onGuyDYTTu4TmIz3l4x43r1JGIWFtZr X-Google-Smtp-Source: AGHT+IEd7tVeutlEBa26WFc4WC76NXASBvVD84TJZ6qsfK5B7+gdeBtUl4L0QdFuI4upFG9P5bWb X-Received: by 2002:ad4:5cab:0:b0:69b:7eb7:a6ac with SMTP id q11-20020ad45cab000000b0069b7eb7a6acmr3276741qvh.51.1714749259816; Fri, 03 May 2024 08:14:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714749259; cv=none; d=google.com; s=arc-20160816; b=G1c/9qgFf4QKSsF2Kf15FmADnxDGN55DjOt0sG4Zl55m0nUlOgShqFcNPVEOnqwLDg grmUA7E32VO4rTu3XnpCj4IvgVTYv9s8ex2ZsDQSK8AIIzbNUOXZcqPkIHfWLBqq6AL4 uv6HAYlvo5qf5rJt3s7qS1CQVzSrzFXEbO46N7cCP7WOykkxvGomyNwltuNb21BUQTFm iL0x36qIrQVuzwPNDSanvbJ0mucglGt/tnMJkdUrKKe2Zrigvu7MFW/dMbcPZazRDq7t 0VtIEhhCvMIFrqFyEaozk6gLaoJfvya8ylcJLM/WagYQ8elKf38OKMKhZI2f6e9RUePi ePmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=r70v09Z1jYUdUmudZIEve8pIi40latYcxErAtIALfYA=; fh=IOfZmL/4G2LGBtSV+LzySu7eotL7HJ1AQcRx3etIBXU=; b=mNifK5o8iVOPHLN9Dbx0SCtrKnFg1624zQ9T1g3Kx8/eskErEsnjfAVKvgBb7n/vMd QiuUOiBHdhp4ycHdTfcoz1MTPorAkKs5t4ic9ndn+HdT1l5Bmm8ihB9bUpELH8rShRNc jm4hvsMCKIdprZ2xqW8A2wcWQaauIqlNSAmUyrFRamcY7tMGBznC8KqQxppKAr191fa0 8j1dM5gMm3nyXyd6yymB/xq7bMzFoPYULts/epIAg6oomfKogrUhkidpv8rpAhA366m3 wxozTybBI7Gz6dZU2edD1kHWzZTZ+gioKECHSWLnnatz2C1Ik83/pg3Ts8OCIyn7wVnZ YAYA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=bCX45han; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id h15-20020a0562140daf00b006a085b097bfsi3485349qvh.592.2024.05.03.08.14.19 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 03 May 2024 08:14:19 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=bCX45han; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s2ubF-0005jm-Ij; Fri, 03 May 2024 11:13:25 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s2ubB-0005iq-2B for qemu-devel@nongnu.org; Fri, 03 May 2024 11:13:21 -0400 Received: from mail-pl1-x633.google.com ([2607:f8b0:4864:20::633]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s2ub9-000751-EJ for qemu-devel@nongnu.org; Fri, 03 May 2024 11:13:20 -0400 Received: by mail-pl1-x633.google.com with SMTP id d9443c01a7336-1e83a2a4f2cso62324865ad.1 for ; Fri, 03 May 2024 08:13:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1714749197; x=1715353997; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=r70v09Z1jYUdUmudZIEve8pIi40latYcxErAtIALfYA=; b=bCX45hanuh12xfSLrREoVERgPPhNpoxZMbUqFy1RjeusOE+hfgBys0ykkIeqOPuUNF sWv6bTo87iXZawMPoTM8ELlZgtmQ7QyWs/WEH1acmfvYj1yBg/wqv3DcbFQ9PUUx4MpY h1YvuvXbeXuq0IFOs6LU3To6WA7NNTVCbyvlsrB6d1IDRZFCQ79ZMaiBdNEeBvSVTCSq 8xjIKHhUEzz6AqcsBHQ36pOiRGfZ4ZlPWzN2dymmq+u7xPrtXV6YEA7qsDxia6Cb9M3y HhUZIEYI+vHbsSF+3v/DvnIyLddGABUkve2rCgh9xn4CmIwgcRJdaByFMZmT88MuOUbl /P1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714749197; x=1715353997; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r70v09Z1jYUdUmudZIEve8pIi40latYcxErAtIALfYA=; b=WLv4AurvgYJNO7cQaM+PlcU8M0Dgef3gGAffSLvRCNw+C6uIVfPCaZQ8EL/uvROc3h 5vlxU8fJSs1dmlaxSLrid3+LQoB3PuI0i65BCMecTpSHnwSfpUZzxzkJTPRHZnWvrVYv DpI0NOsMvRK/veo3RGvCqLeLVb/N80ThZnPib3RZrY8v4AZZ5xWrc/TFpc8DVjF9OH9F e/gVZMOQMLkPNTbusiGEBz3mMa0qR2gB7/s6lFRXaHTy6PHyANIMmqrIcaymAB0FhEhF asyuuTCFnu6wVTt29LtGnF5mgtd7Swdux0c2PvyqiohGZ3+UFPVMCbcZ2M8F2T2sNnDO oqFg== X-Gm-Message-State: AOJu0Yxr15UuRBxq3PobhTsZ9iRj1r1YMbnz1gdrNsokYM06E+3joqw/ mNV4N+dIBiMOGHX7RwzUGtxM0en04hsLXUHidyttbS2fpvUXSgt+woD1InmmhbC/2XGlNVmxznD t X-Received: by 2002:a17:902:82c8:b0:1ea:b3ba:79f5 with SMTP id u8-20020a17090282c800b001eab3ba79f5mr2767694plz.60.1714749197283; Fri, 03 May 2024 08:13:17 -0700 (PDT) Received: from stoup.. (174-21-72-5.tukw.qwest.net. [174.21.72.5]) by smtp.gmail.com with ESMTPSA id p10-20020a170902c70a00b001e81c778784sm3366611plp.67.2024.05.03.08.13.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 May 2024 08:13:16 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: Alexander Monakov , Mikhail Romanov Subject: [PULL 01/10] util/bufferiszero: Remove SSE4.1 variant Date: Fri, 3 May 2024 08:13:05 -0700 Message-Id: <20240503151314.336357-2-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240503151314.336357-1-richard.henderson@linaro.org> References: <20240503151314.336357-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::633; envelope-from=richard.henderson@linaro.org; helo=mail-pl1-x633.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov The SSE4.1 variant is virtually identical to the SSE2 variant, except for using 'PTEST+JNZ' in place of 'PCMPEQB+PMOVMSKB+CMP+JNE' for testing if an SSE register is all zeroes. The PTEST instruction decodes to two uops, so it can be handled only by the complex decoder, and since CMP+JNE are macro-fused, both sequences decode to three uops. The uops comprising the PTEST instruction dispatch to p0 and p5 on Intel CPUs, so PCMPEQB+PMOVMSKB is comparatively more flexible from dispatch standpoint. Hence, the use of PTEST brings no benefit from throughput standpoint. Its latency is not important, since it feeds only a conditional jump, which terminates the dependency chain. I never observed PTEST variants to be faster on real hardware. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-2-amonakov@ispras.ru> --- util/bufferiszero.c | 29 ----------------------------- 1 file changed, 29 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 3e6a5dfd63..f5a3634f9a 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -100,34 +100,6 @@ buffer_zero_sse2(const void *buf, size_t len) } #ifdef CONFIG_AVX2_OPT -static bool __attribute__((target("sse4"))) -buffer_zero_sse4(const void *buf, size_t len) -{ - __m128i t = _mm_loadu_si128(buf); - __m128i *p = (__m128i *)(((uintptr_t)buf + 5 * 16) & -16); - __m128i *e = (__m128i *)(((uintptr_t)buf + len) & -16); - - /* Loop over 16-byte aligned blocks of 64. */ - while (likely(p <= e)) { - __builtin_prefetch(p); - if (unlikely(!_mm_testz_si128(t, t))) { - return false; - } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } - - /* Finish the aligned tail. */ - t |= e[-3]; - t |= e[-2]; - t |= e[-1]; - - /* Finish the unaligned tail. */ - t |= _mm_loadu_si128(buf + len - 16); - - return _mm_testz_si128(t, t); -} - static bool __attribute__((target("avx2"))) buffer_zero_avx2(const void *buf, size_t len) { @@ -221,7 +193,6 @@ select_accel_cpuinfo(unsigned info) #endif #ifdef CONFIG_AVX2_OPT { CPUINFO_AVX2, 128, buffer_zero_avx2 }, - { CPUINFO_SSE4, 64, buffer_zero_sse4 }, #endif { CPUINFO_SSE2, 64, buffer_zero_sse2 }, { CPUINFO_ALWAYS, 0, buffer_zero_int }, From patchwork Fri May 3 15:13:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 794385 Delivered-To: patch@linaro.org Received: by 2002:adf:a153:0:b0:34d:5089:5a9e with SMTP id r19csp788744wrr; Fri, 3 May 2024 08:14:21 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVczAOAepWY1YzDit0P8mnim9QmLzN7PH3Hfj7Mt6RruPFNwxnjir7nxurB6mDo3TlOJ0DtYUN40AchKugJ7JfB X-Google-Smtp-Source: AGHT+IF8uCExJGTHrlYOGQl7MEO5T/80puwJdLPKjjtYrWQDm12jbq3UKad1OB6P7y69bmzCS8oG X-Received: by 2002:a05:6214:e69:b0:6a0:cd65:5996 with SMTP id jz9-20020a0562140e6900b006a0cd655996mr3178827qvb.8.1714749261530; Fri, 03 May 2024 08:14:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714749261; cv=none; d=google.com; s=arc-20160816; b=QGA4S0X4s4pgp1lGcuBimxgDJc8jirQPy6de/vONKKx7fqTOVD9v4nmV/0OSii7jjp XtcARwhtvi96fQ0PklDYcWMx37vdPCthumJ6KQ9b5DftuAZX407gWLUbK0AFF2GxhkWD P4AehzVomHE4M/ecP7mChOOVMXw0UF2yeE4QuWB+8bxHanbokCj/cPbG33Nci6e6xTup BfH83IJ2KD3zsrpCKhBLbY94bUnLh2KL1wP444qx/VrJrguwE64yehSKXzSA6AETz7DB DVmN4NMh7cXOAMAAbXlkRYaSQCq/6V64ccfPcLRyJc3FfI4frnci3H3pHyJkWj62VVA3 uiZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=Sc6Au+pExfgVbkEVt3xBcrgjk6DJCZWLxXzjjgNJTcI=; fh=IOfZmL/4G2LGBtSV+LzySu7eotL7HJ1AQcRx3etIBXU=; b=05fC1AWpR5Ps2l/lMrk4Nsv475H/d4IO2+IbFEkze4/uqLZD1m0j3MYj81gFmWPrls qmJaCdCQb6YtMA/GUTTYpxsCoA6lN7su5PbLyLLQ/Svopb3+7FRgtnV9cRCkhneTD/oa Oqbsy59UHqfmssat2JB4YgKAKZzKigTWvyK0Jno37fnSMn6WbISdUwxNQcqAypRs3el9 0VaI6y/RlDYIy0d4DF1i+IOfq45naeo8hJ/+FnJjZM27KNOzHY7kBFCf9Jsz9/t4Nh2u ItgTwQZ7w+HY5hazRA7eg8TGnVar/spyGUiUqml7zaRf0HYUjNJUYj+GWPz4l/a3D7zT 7Q4g==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=CjlG5xoR; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id r14-20020a056214124e00b006a0ac634515si3441597qvv.229.2024.05.03.08.14.21 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 03 May 2024 08:14:21 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=CjlG5xoR; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s2ubG-0005kc-80; Fri, 03 May 2024 11:13:26 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s2ubD-0005jR-0Y for qemu-devel@nongnu.org; Fri, 03 May 2024 11:13:23 -0400 Received: from mail-pf1-x42d.google.com ([2607:f8b0:4864:20::42d]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s2ub9-000756-SV for qemu-devel@nongnu.org; Fri, 03 May 2024 11:13:22 -0400 Received: by mail-pf1-x42d.google.com with SMTP id d2e1a72fcca58-6f30f69a958so7969946b3a.1 for ; Fri, 03 May 2024 08:13:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1714749198; x=1715353998; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Sc6Au+pExfgVbkEVt3xBcrgjk6DJCZWLxXzjjgNJTcI=; b=CjlG5xoRzjjZWNrf2vCCTMOh51jL3lWR8laZu8PMJMtwHTVdcyXcQ0pbkGsJB0C2pD IAj1rFj5xK0g3P9Uk41yj+JBUhIlAmFHZp1JVvu2Y7Q7i3LHc+HfuCJy2vQz9kSx2Y88 jBkX0A9Q1taleDZDg8g0t8+Kuw4iqyztZpG+DKH43Ls4fAZymHHd0D9CXzoIz/dePJCf vjbT428k2IwuWjeb6cYJkj35DggeVV6v8P+EVVU0sr+cUI4aGYLZvW4abblMLoPqwI1Q sz83SnT4lCZ+66zbHZwo3eb7VWUx6appS/NBDVsfiTvb792nXNKGEti9OPJtIcxSo3n5 fl5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714749198; x=1715353998; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Sc6Au+pExfgVbkEVt3xBcrgjk6DJCZWLxXzjjgNJTcI=; b=MvMqges6aYRH1glEqP0f6eSYjlKy2urP4AbHl3jtuKtToLc91/XmGbFNh98Y6mJ/RI uyGun/57BxSC24TQYZ01QFPuI5/7oGwK0MLylfpIc34bkm23tyvXF5pZEVmiU8BFtH1K tefRJ7HxWnQKYbcQgI4Oz1Mib7I7wcD0DJ00wLEYv/k82tSQ6Hu3HwO3c98SNjauNoz3 kGxt2a+uyT4dLbkOmc0QMuPMepEUrb+BXc/hTYKhvfsLUio/dypbYvkkr2yhr+5Fqi8B Xfk2qQ5sw0a2DpH2S7T+pJfS0aOik3DtvUDOX1FPQZqAPq7EYCt9G5EpruZXxLHxWNyW cfIg== X-Gm-Message-State: AOJu0YyZd3RFJL4JotVqjt9qbD20jHIpqRLBVAxCToG/t4r+ZTnB1xOY oaCCfu3fINDHzhaB2O+DbUyMYR86I/FJY7cdnE1s7KDwcDpXt1Rm5XK9aiSd8UxjRCloq18GG61 Q X-Received: by 2002:a17:903:110e:b0:1e4:342a:b351 with SMTP id n14-20020a170903110e00b001e4342ab351mr3211195plh.4.1714749198205; Fri, 03 May 2024 08:13:18 -0700 (PDT) Received: from stoup.. (174-21-72-5.tukw.qwest.net. [174.21.72.5]) by smtp.gmail.com with ESMTPSA id p10-20020a170902c70a00b001e81c778784sm3366611plp.67.2024.05.03.08.13.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 May 2024 08:13:17 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: Alexander Monakov , Mikhail Romanov Subject: [PULL 02/10] util/bufferiszero: Remove AVX512 variant Date: Fri, 3 May 2024 08:13:06 -0700 Message-Id: <20240503151314.336357-3-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240503151314.336357-1-richard.henderson@linaro.org> References: <20240503151314.336357-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::42d; envelope-from=richard.henderson@linaro.org; helo=mail-pf1-x42d.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov Thanks to early checks in the inline buffer_is_zero wrapper, the SIMD routines are invoked much more rarely in normal use when most buffers are non-zero. This makes use of AVX512 unprofitable, as it incurs extra frequency and voltage transition periods during which the CPU operates at reduced performance, as described in https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html Signed-off-by: Mikhail Romanov Signed-off-by: Alexander Monakov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-4-amonakov@ispras.ru> Signed-off-by: Richard Henderson --- util/bufferiszero.c | 38 +++----------------------------------- 1 file changed, 3 insertions(+), 35 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index f5a3634f9a..641d5f9b9e 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -64,7 +64,7 @@ buffer_zero_int(const void *buf, size_t len) } } -#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) || defined(__SSE2__) +#if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) #include /* Note that each of these vectorized functions require len >= 64. */ @@ -128,41 +128,12 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ -#ifdef CONFIG_AVX512F_OPT -static bool __attribute__((target("avx512f"))) -buffer_zero_avx512(const void *buf, size_t len) -{ - /* Begin with an unaligned head of 64 bytes. */ - __m512i t = _mm512_loadu_si512(buf); - __m512i *p = (__m512i *)(((uintptr_t)buf + 5 * 64) & -64); - __m512i *e = (__m512i *)(((uintptr_t)buf + len) & -64); - - /* Loop over 64-byte aligned blocks of 256. */ - while (p <= e) { - __builtin_prefetch(p); - if (unlikely(_mm512_test_epi64_mask(t, t))) { - return false; - } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } - - t |= _mm512_loadu_si512(buf + len - 4 * 64); - t |= _mm512_loadu_si512(buf + len - 3 * 64); - t |= _mm512_loadu_si512(buf + len - 2 * 64); - t |= _mm512_loadu_si512(buf + len - 1 * 64); - - return !_mm512_test_epi64_mask(t, t); - -} -#endif /* CONFIG_AVX512F_OPT */ - /* * Make sure that these variables are appropriately initialized when * SSE2 is enabled on the compiler command-line, but the compiler is * too old to support CONFIG_AVX2_OPT. */ -#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) +#if defined(CONFIG_AVX2_OPT) # define INIT_USED 0 # define INIT_LENGTH 0 # define INIT_ACCEL buffer_zero_int @@ -188,9 +159,6 @@ select_accel_cpuinfo(unsigned info) unsigned len; bool (*fn)(const void *, size_t); } all[] = { -#ifdef CONFIG_AVX512F_OPT - { CPUINFO_AVX512F, 256, buffer_zero_avx512 }, -#endif #ifdef CONFIG_AVX2_OPT { CPUINFO_AVX2, 128, buffer_zero_avx2 }, #endif @@ -208,7 +176,7 @@ select_accel_cpuinfo(unsigned info) return 0; } -#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) +#if defined(CONFIG_AVX2_OPT) static void __attribute__((constructor)) init_accel(void) { used_accel = select_accel_cpuinfo(cpuinfo_init()); From patchwork Fri May 3 15:13:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 794387 Delivered-To: patch@linaro.org Received: by 2002:adf:a153:0:b0:34d:5089:5a9e with SMTP id r19csp788781wrr; Fri, 3 May 2024 08:14:24 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVsnKMHXPdNJUtOiqupszHh6+xXrEGeT4R4DYtRETWFDf1Q5FLIdcONWqDuaTcvfQfeESRBfGJQSwNZXCwqP4Ze X-Google-Smtp-Source: AGHT+IGFuc3fuT4o4ULVTQKX4mlPBO3+Z4mttFNFCDvr4OZyPOrnkq6+bOWXo4XlQ0rjEGxmb7Rj X-Received: by 2002:a05:622a:2d1:b0:43b:c4:4188 with SMTP id a17-20020a05622a02d100b0043b00c44188mr3324574qtx.41.1714749264545; Fri, 03 May 2024 08:14:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714749264; cv=none; d=google.com; s=arc-20160816; b=fVSuv0fnZ1+7kaFb43m4d6ILZHMykMv7srWLVwDeknYatOarQ3Wte/422cvG0T6bCa cnGZ8RSBfrBUVocjDSiRV/WJkRJhvI6U4V2remHIRfa1of8Ua76ywx+eratJOC8Z6pnm 6wK9k2lO3Dh3TAOO/utfEbz5msbbmZ1DncSeifZ+dYKiO6//j9zrzP5LFvuNGDlYZ7Jd /qbT/I3k4VQMSyt+YrIi08XFwM3gqM2pp5KfGtH/3eCXfz3vTQjyY9LemQo5VApzlT6W yvHU7YiLyf9gPENEQjpJGPRtwvjLKy0NE+QvC7i0YfxKWyJmVl+krF1ghN5izNHA3XJa ysNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=Aco89HGhDUqxtzZSl7gQEXCIzX7CLhr3rgmAxnnD0GI=; fh=IOfZmL/4G2LGBtSV+LzySu7eotL7HJ1AQcRx3etIBXU=; b=l1LzSVxdbBgibw49psUdvlhzf9n6VHwEvN4yISJm0AsU/Ps3GnWRYPbfZKfRglw7J5 ULDrkS8w4GP+xvbonkUs6D0hlMJ9zWCSsl+7qFKx90ELG5NL652roSpu6e1heirF7m+H bVNZJb5f5afBRKVHzxIsj5Mj/qcYMyXMLV1G5WUFacugpi8iaeCvGschmRYfMHUaz2SB bYuKe5THd6z4X4ZcuURhWGDzfRone8zyuPWDduIp04UzKfPAQNoEdnR3IwbGszMVZYHz 6bosSv6vjmdfu/0rMrutSW3IA9fjy15zUonuxPWU5DXMomJUzF5S3tJtthNSPJ+NEsKG uXSg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ZJtpmc25; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id v5-20020ac85785000000b0043aedd74579si3681241qta.287.2024.05.03.08.14.24 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 03 May 2024 08:14:24 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ZJtpmc25; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s2ubH-0005my-HK; Fri, 03 May 2024 11:13:27 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s2ubD-0005jg-O6 for qemu-devel@nongnu.org; Fri, 03 May 2024 11:13:23 -0400 Received: from mail-pl1-x630.google.com ([2607:f8b0:4864:20::630]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s2ubA-00075F-Ni for qemu-devel@nongnu.org; Fri, 03 May 2024 11:13:23 -0400 Received: by mail-pl1-x630.google.com with SMTP id d9443c01a7336-1e4266673bbso83928225ad.2 for ; Fri, 03 May 2024 08:13:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1714749199; x=1715353999; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Aco89HGhDUqxtzZSl7gQEXCIzX7CLhr3rgmAxnnD0GI=; b=ZJtpmc25y35pC+s7AZ5mKmcMQf2jbRLNyCOtIXMs/ZTmzT1SIyCsjf4ijvZWOiaFep mlWoEdm7ZRN3gOdIABc+1qgypua1fQBCXVoqNNw6vY2HX6UM0CHaodAG2rF3wkfWBWlg I/aa8OuXAtmC7Tx0T/Ark6DLiQfevhPn6R8u0tsKaJ1YVO+MmFsJbxwWf4AhlzVQyyWY Tm43nnjLjHbzA4wpifcUZU6NoXnm+k47MHYPPuFXsbV5zRZBZKHfgawt+gdnVUTQr0Pg p4mFSF5QG7qEVaaTM9v6gYEQptzyheDs6Y7/AJ1ikudKsRr9MWoE69T82r0W+QXnsA7Y Z5JA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714749199; x=1715353999; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Aco89HGhDUqxtzZSl7gQEXCIzX7CLhr3rgmAxnnD0GI=; b=dE5y5dbrgNSoTyaveT2aMHtg4UUvJfU1n+aSvBr76JH165fHp9snIU6Av6zwYi8Qe0 xu8//g68ebekziKzXndSQ7t6yqcdLZcu9+2yhJBf01IxRTPaE97j0eQ74BD6VwUr6Z9t Hty+N39YjNn8V+7OgPA7gsbMqgnbB3HTDuud1uRXi6cp4tqrRtxcQ9sK5kJTkDMzborR KZBQaVqiSaMRGgJkd0pMX3orXOMbSt8Fasxz5zYPGwRvIg3/zZ6tmnXjf7VJP9t24J+j KijGbrHb/nygakuS5BbPyfhlLFkhJ/RzO6N8QMzqt5e3uhXPQoJcv6J2azGdwstig970 qAHw== X-Gm-Message-State: AOJu0YycUObg96EIc/HLjOwPyX64bJdEEOpJuJmnyferSHuFtQw9w7zW Pt8aHytXvHmZOZeejj9bYTYrNA3p+RxDE0XxQ8h4vURIGQ6n86K2RUMyongsMil1f8SudAdXJf5 J X-Received: by 2002:a17:902:9a94:b0:1e8:32ed:6f6d with SMTP id w20-20020a1709029a9400b001e832ed6f6dmr2563343plp.39.1714749199132; Fri, 03 May 2024 08:13:19 -0700 (PDT) Received: from stoup.. (174-21-72-5.tukw.qwest.net. [174.21.72.5]) by smtp.gmail.com with ESMTPSA id p10-20020a170902c70a00b001e81c778784sm3366611plp.67.2024.05.03.08.13.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 May 2024 08:13:18 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: Alexander Monakov , Mikhail Romanov Subject: [PULL 03/10] util/bufferiszero: Reorganize for early test for acceleration Date: Fri, 3 May 2024 08:13:07 -0700 Message-Id: <20240503151314.336357-4-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240503151314.336357-1-richard.henderson@linaro.org> References: <20240503151314.336357-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::630; envelope-from=richard.henderson@linaro.org; helo=mail-pl1-x630.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov Test for length >= 256 inline, where is is often a constant. Before calling into the accelerated routine, sample three bytes from the buffer, which handles most non-zero buffers. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Message-Id: <20240206204809.9859-3-amonakov@ispras.ru> [rth: Use __builtin_constant_p; move the indirect call out of line.] Signed-off-by: Richard Henderson --- include/qemu/cutils.h | 32 ++++++++++++++++- util/bufferiszero.c | 84 +++++++++++++++++-------------------------- 2 files changed, 63 insertions(+), 53 deletions(-) diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h index 92c927a6a3..741dade7cf 100644 --- a/include/qemu/cutils.h +++ b/include/qemu/cutils.h @@ -187,9 +187,39 @@ char *freq_to_str(uint64_t freq_hz); /* used to print char* safely */ #define STR_OR_NULL(str) ((str) ? (str) : "null") -bool buffer_is_zero(const void *buf, size_t len); +/* + * Check if a buffer is all zeroes. + */ + +bool buffer_is_zero_ool(const void *vbuf, size_t len); +bool buffer_is_zero_ge256(const void *vbuf, size_t len); bool test_buffer_is_zero_next_accel(void); +static inline bool buffer_is_zero_sample3(const char *buf, size_t len) +{ + /* + * For any reasonably sized buffer, these three samples come from + * three different cachelines. In qemu-img usage, we find that + * each byte eliminates more than half of all buffer testing. + * It is therefore critical to performance that the byte tests + * short-circuit, so that we do not pull in additional cache lines. + * Do not "optimize" this to !(a | b | c). + */ + return !buf[0] && !buf[len - 1] && !buf[len / 2]; +} + +#ifdef __OPTIMIZE__ +static inline bool buffer_is_zero(const void *buf, size_t len) +{ + return (__builtin_constant_p(len) && len >= 256 + ? buffer_is_zero_sample3(buf, len) && + buffer_is_zero_ge256(buf, len) + : buffer_is_zero_ool(buf, len)); +} +#else +#define buffer_is_zero buffer_is_zero_ool +#endif + /* * Implementation of ULEB128 (http://en.wikipedia.org/wiki/LEB128) * Input is limited to 14-bit numbers diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 641d5f9b9e..972f394cbd 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -26,8 +26,9 @@ #include "qemu/bswap.h" #include "host/cpuinfo.h" -static bool -buffer_zero_int(const void *buf, size_t len) +static bool (*buffer_is_zero_accel)(const void *, size_t); + +static bool buffer_is_zero_integer(const void *buf, size_t len) { if (unlikely(len < 8)) { /* For a very small buffer, simply accumulate all the bytes. */ @@ -128,60 +129,38 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ -/* - * Make sure that these variables are appropriately initialized when - * SSE2 is enabled on the compiler command-line, but the compiler is - * too old to support CONFIG_AVX2_OPT. - */ -#if defined(CONFIG_AVX2_OPT) -# define INIT_USED 0 -# define INIT_LENGTH 0 -# define INIT_ACCEL buffer_zero_int -#else -# ifndef __SSE2__ -# error "ISA selection confusion" -# endif -# define INIT_USED CPUINFO_SSE2 -# define INIT_LENGTH 64 -# define INIT_ACCEL buffer_zero_sse2 -#endif - -static unsigned used_accel = INIT_USED; -static unsigned length_to_accel = INIT_LENGTH; -static bool (*buffer_accel)(const void *, size_t) = INIT_ACCEL; - static unsigned __attribute__((noinline)) select_accel_cpuinfo(unsigned info) { /* Array is sorted in order of algorithm preference. */ static const struct { unsigned bit; - unsigned len; bool (*fn)(const void *, size_t); } all[] = { #ifdef CONFIG_AVX2_OPT - { CPUINFO_AVX2, 128, buffer_zero_avx2 }, + { CPUINFO_AVX2, buffer_zero_avx2 }, #endif - { CPUINFO_SSE2, 64, buffer_zero_sse2 }, - { CPUINFO_ALWAYS, 0, buffer_zero_int }, + { CPUINFO_SSE2, buffer_zero_sse2 }, + { CPUINFO_ALWAYS, buffer_is_zero_integer }, }; for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { if (info & all[i].bit) { - length_to_accel = all[i].len; - buffer_accel = all[i].fn; + buffer_is_zero_accel = all[i].fn; return all[i].bit; } } return 0; } -#if defined(CONFIG_AVX2_OPT) +static unsigned used_accel; + static void __attribute__((constructor)) init_accel(void) { used_accel = select_accel_cpuinfo(cpuinfo_init()); } -#endif /* CONFIG_AVX2_OPT */ + +#define INIT_ACCEL NULL bool test_buffer_is_zero_next_accel(void) { @@ -194,36 +173,37 @@ bool test_buffer_is_zero_next_accel(void) used_accel |= used; return used; } - -static bool select_accel_fn(const void *buf, size_t len) -{ - if (likely(len >= length_to_accel)) { - return buffer_accel(buf, len); - } - return buffer_zero_int(buf, len); -} - #else -#define select_accel_fn buffer_zero_int bool test_buffer_is_zero_next_accel(void) { return false; } + +#define INIT_ACCEL buffer_is_zero_integer #endif -/* - * Checks if a buffer is all zeroes - */ -bool buffer_is_zero(const void *buf, size_t len) +static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL; + +bool buffer_is_zero_ool(const void *buf, size_t len) { if (unlikely(len == 0)) { return true; } + if (!buffer_is_zero_sample3(buf, len)) { + return false; + } + /* All bytes are covered for any len <= 3. */ + if (unlikely(len <= 3)) { + return true; + } - /* Fetch the beginning of the buffer while we select the accelerator. */ - __builtin_prefetch(buf); - - /* Use an optimized zero check if possible. Note that this also - includes a check for an unrolled loop over 64-bit integers. */ - return select_accel_fn(buf, len); + if (likely(len >= 256)) { + return buffer_is_zero_accel(buf, len); + } + return buffer_is_zero_integer(buf, len); +} + +bool buffer_is_zero_ge256(const void *buf, size_t len) +{ + return buffer_is_zero_accel(buf, len); } From patchwork Fri May 3 15:13:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 794389 Delivered-To: patch@linaro.org Received: by 2002:adf:a153:0:b0:34d:5089:5a9e with SMTP id r19csp789055wrr; Fri, 3 May 2024 08:14:56 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWJ0bqccObsuzAdI8kag9AtGJHH7RdFHffXIP0zEMHTNhbxZHGu5SPi1RP3g2YeZ0fCPOvAGN4YmkbGie20rgui X-Google-Smtp-Source: AGHT+IF+gUWn2KrGAumfMT5Z1JusADyxOsdwNxjny9ayYzDyfEYnE0swAcPMPLDpa7gz99/4neA9 X-Received: by 2002:a05:620a:c11:b0:790:865b:5b5c with SMTP id l17-20020a05620a0c1100b00790865b5b5cmr2572860qki.54.1714749296280; Fri, 03 May 2024 08:14:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714749296; cv=none; d=google.com; s=arc-20160816; b=AXegePTFZJFqZUBjD8wJUlZJwDRRG63/mTHIGN3gMPmurQxX6/Px45x0yCUBYeumuL X/O/QuUVw7y3bwxA6aBSzywZt6QzSFRVk6seFTyIptuVdtP5rJB7ztYQKwPQInpHRFmi e0sl2QrLKxrQysIujPRIQ/preNlXxz2ZV4SD7Pd0fvE8q/RajqzDIiR8yQvNiHyp+F2Q OutIS8lAdqpZNEnxsLYt73EDCIVTjpLbaBPLJ+JDC/6FsVYZ23hL3a+khgDGrZhiuykx KtGcqTo4IshRiRf3pNdnwhBiz0QaFIBVEA3fbLfbUktlmiPeoLxzbTZ6EqCw85Sej+o6 S87w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=RPMJ9Fq7Hagw2E+aglC+ITwqURcRV1FccXRLb5FLQFU=; fh=IOfZmL/4G2LGBtSV+LzySu7eotL7HJ1AQcRx3etIBXU=; b=u+CnLLGCKheAewsIBJv8jeH6xWQMWFwlMd6tW3Ngk3o3xQB5jVEZ9BmfAqDg2/+ADX /f6p7nvUWJ0yJu3bf10LVbOg9FeJTzn8udmFKEwKfsNLV+npnt7LH5Fyf6acQPz67qJ2 3Xr2HWXWkUkx73yxPgx6PcY0x+rbHZiML83+t8DkjbURQh7UN8hazdTHpvv/C8QbMsy4 7KOXufAKMiU7q1Rykyzkpv//OKncnsbnS8vzI1wubnpEX3G4mYQI023zu3JS4YYnCDb+ aQX3/ebj0Nnx/RJaLJWdjKPA2TEUrJ9zllA5jpKMlUJIafHb+wmMW+X9Zlt/Gin4nYXE IRNA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=xqMN0+92; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id b18-20020a05620a0cd200b00790fbaa289dsi3198914qkj.462.2024.05.03.08.14.56 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 03 May 2024 08:14:56 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=xqMN0+92; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s2ubI-0005nP-UR; Fri, 03 May 2024 11:13:28 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s2ubE-0005js-9o for qemu-devel@nongnu.org; Fri, 03 May 2024 11:13:25 -0400 Received: from mail-pl1-x635.google.com ([2607:f8b0:4864:20::635]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s2ubB-00075T-Im for qemu-devel@nongnu.org; Fri, 03 May 2024 11:13:24 -0400 Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-1e5c7d087e1so80051375ad.0 for ; Fri, 03 May 2024 08:13:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1714749200; x=1715354000; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=RPMJ9Fq7Hagw2E+aglC+ITwqURcRV1FccXRLb5FLQFU=; b=xqMN0+92TlmXuJGGogFlVfpExHPOiiw3S3BMIqq9X5Cw7mI4fC2To0ZbE59wVLIeZZ D9CbH1RNsHdVE9riIZFjjWyCR2/oTQ2iVkJKdm1+UXMsdULZZfVDtFn+Xy2Z6rMQ/cEv GwyLUBJFexZJvA/hdyIRCThcbGw9ReBFrLs3W7r9bC+xJLo9VtLc0J/5CqyjLeOKP2rW LQVY3H0SDUFgExEVHyySnVRPfk9BTFWsLgcIz62xB4ZEbPq0SWSbQEdeje3ybmvwtCwY glCz39W3x+1wY5xamsr/QAm1neov/B8f9O1G91I0LAt9KT8VuEj8X2KG3j2ZlJuQ958A WdmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714749200; x=1715354000; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RPMJ9Fq7Hagw2E+aglC+ITwqURcRV1FccXRLb5FLQFU=; b=itF0BiPxW1MkcuCP437HaGubivxisGew5XYsFlMZHGHRTnMa/EcS15XA2bbtZVT1lT FLjwFugmN3erVCu5IwxzJS/q9xSUH913Pa6dqZ+Wlx3Egdp9//rGuLBIGsy36s2s7s7M EJQaklrDLqihCeGhzSHD4viH2nnU+eHNW/p+7HdhDjtKcV8Xzl30FedVmiZCfly/pO0y ld8r+A8RERR0jhZ0c96f0EjAK0aOss1ufm+Xb0g9toRY1thWntV8YbyWauRCllWGdd8X mnIhmHwuwEXPZxxLa3kFJqn4Badaegzihjsn/peTJPQlGsEU9Jn1ZWMC9HA/D8ZQFvfy p9tA== X-Gm-Message-State: AOJu0YwIkw5BVD0nkH59PZ/laHN66ToYWKPHMNgDHyHg7g93eCaSdVW1 ct1VWuqL1MCPEn+bE+1b4ixg6rFEVP8qz0JP3C9eBiWtCL/9ljRHtq1uKnLCf8eyVRH8TyuNkTH s X-Received: by 2002:a17:903:2b07:b0:1e4:c8b3:dbaf with SMTP id mc7-20020a1709032b0700b001e4c8b3dbafmr3106172plb.33.1714749200063; Fri, 03 May 2024 08:13:20 -0700 (PDT) Received: from stoup.. (174-21-72-5.tukw.qwest.net. [174.21.72.5]) by smtp.gmail.com with ESMTPSA id p10-20020a170902c70a00b001e81c778784sm3366611plp.67.2024.05.03.08.13.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 May 2024 08:13:19 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: Alexander Monakov , Mikhail Romanov Subject: [PULL 04/10] util/bufferiszero: Remove useless prefetches Date: Fri, 3 May 2024 08:13:08 -0700 Message-Id: <20240503151314.336357-5-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240503151314.336357-1-richard.henderson@linaro.org> References: <20240503151314.336357-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::635; envelope-from=richard.henderson@linaro.org; helo=mail-pl1-x635.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov Use of prefetching in bufferiszero.c is quite questionable: - prefetches are issued just a few CPU cycles before the corresponding line would be hit by demand loads; - they are done for simple access patterns, i.e. where hardware prefetchers can perform better; - they compete for load ports in loops that should be limited by load port throughput rather than ALU throughput. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-5-amonakov@ispras.ru> --- util/bufferiszero.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 972f394cbd..00118d649e 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -50,7 +50,6 @@ static bool buffer_is_zero_integer(const void *buf, size_t len) const uint64_t *e = (uint64_t *)(((uintptr_t)buf + len) & -8); for (; p + 8 <= e; p += 8) { - __builtin_prefetch(p + 8); if (t) { return false; } @@ -80,7 +79,6 @@ buffer_zero_sse2(const void *buf, size_t len) /* Loop over 16-byte aligned blocks of 64. */ while (likely(p <= e)) { - __builtin_prefetch(p); t = _mm_cmpeq_epi8(t, zero); if (unlikely(_mm_movemask_epi8(t) != 0xFFFF)) { return false; @@ -111,7 +109,6 @@ buffer_zero_avx2(const void *buf, size_t len) /* Loop over 32-byte aligned blocks of 128. */ while (p <= e) { - __builtin_prefetch(p); if (unlikely(!_mm256_testz_si256(t, t))) { return false; } From patchwork Fri May 3 15:13:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 794390 Delivered-To: patch@linaro.org Received: by 2002:adf:a153:0:b0:34d:5089:5a9e with SMTP id r19csp789111wrr; Fri, 3 May 2024 08:15:01 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXptzgCbsisDJ1GaeGQM7Gn3xtsXMAL6ieeFghYsreW56+XX9/g2iTWFIx20Pngs1eV3vWA+puJMG5sTOpDgJx5 X-Google-Smtp-Source: AGHT+IHBh+2ulJ2XTWMy6SzzkvmuTkZxDkqbyirEWasE/WoIS0bXXk/WUW4Xg2kHQFg7L/QGeYIf X-Received: by 2002:a05:6214:1c48:b0:6a0:e78f:3545 with SMTP id if8-20020a0562141c4800b006a0e78f3545mr3423808qvb.22.1714749301048; Fri, 03 May 2024 08:15:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714749301; cv=none; d=google.com; s=arc-20160816; b=tAGp+nCZmFhpMiL9IjA0JM3mgST2Zy5yarbb0RMAwAWTNmhJzmbebWkpg3wPiTPhpn PjS4QxuF/TohOb6BMJGoqwZCJJVxvDNlBZMybqlyCHjatZCtPErtQTHTOf7CkuQ9VoFC uDxd6X84bPXJsyo/9KV9BmiSFkJRD+5/rr9lLzFCgYFgqdlyK/pJ4e2jqlPSTZzQpfHo jpcqeaSzx4H0wVznggV9crbSemN/9wu2exwho6Va/fA/6PQ+EMBoFV/XpHBwJdBn16xD NuhOFt8+CLqEReEC9KpVLpG4ymuJnX6QVYNZ8bQae+JKv5QNGwroyfWdEUVYjWLAp5J/ 0Fag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=LJEBA8HhooZ+SwbOdLUPlZU44jMDHr0d5xq5JBRqrDA=; fh=IOfZmL/4G2LGBtSV+LzySu7eotL7HJ1AQcRx3etIBXU=; b=QAzCq3COYDG8Nz15rcFIpqYOCqf9w9kekjY5uala2QgZbtJtpqf309N1gu4rulZ+BX FNfMeCQkdeZawe9CdYs/AsV866rl7wvOknRi+oE05Ku7WBYKkRWc+/ekD0T+ZGtJ1XsH NC/Wp/UNC0K3YqIJjqydmYjWdcULl51KiktbKjxRBvFYCYoAfjCbYemog/YwjXSuXJWI ms04uQbmexU98J3Qmdquy7+eN0DJrmwHA8Fm5SOREY1Zzha3d4P5/7GFW/5bXq0XUQ0i nO+x09Jfee/2s99rthq9UmIx1ztoQ0riL1N/iZA+U232s3jUd7adyQgj7pFWZd8T13dL D/7g==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=kZfl26y3; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id t6-20020a05621421a600b0069c185ec102si3467665qvc.364.2024.05.03.08.15.00 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 03 May 2024 08:15:01 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=kZfl26y3; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s2ubJ-0005oV-QE; Fri, 03 May 2024 11:13:29 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s2ubE-0005jy-SK for qemu-devel@nongnu.org; Fri, 03 May 2024 11:13:25 -0400 Received: from mail-pl1-x636.google.com ([2607:f8b0:4864:20::636]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s2ubC-00075g-MJ for qemu-devel@nongnu.org; Fri, 03 May 2024 11:13:24 -0400 Received: by mail-pl1-x636.google.com with SMTP id d9443c01a7336-1ecff927a45so16194385ad.0 for ; Fri, 03 May 2024 08:13:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1714749201; x=1715354001; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=LJEBA8HhooZ+SwbOdLUPlZU44jMDHr0d5xq5JBRqrDA=; b=kZfl26y3IPmxVoO1Ncxr0aHwBQOOeb6uONzoBsXmQyL0vUVQYaEtOoeByjFoMxiDbn gRLglCAm3pwCMokW+5AaaDuOU6KNb5XXw+bDJotJ1VgF549tUGW+MB1cVIh/WP4GAfbp iau9eYrKI9mG81rA9c0I3cwu8frgXS7zXySkonTJaVVjiXjC3itjSQPW8hy2HrSNVZgQ rVcOLBtIhLre31yhLVGpJMZvwBp9ymJWQ9yMneGVv1dBwQ8rwxZBoZVHmFXywTaV/vqZ yMoSLSO8Lbi06Aa00A31znh8uz4n7JDt4MYBYEeZ1lRYl7gfR7h5BE3gRYd92KDYG7t5 VGyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714749201; x=1715354001; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LJEBA8HhooZ+SwbOdLUPlZU44jMDHr0d5xq5JBRqrDA=; b=aAoCf6OKANhJens9Cut1geqTGOpGS4WHZNGw9lNUTaajmCdnf3BhRgR08SwLpaopp7 OmuqdqKrWYCN8Q7XKD5YVZZ5XMd+7bp/Q5HrM36NIoXH9uocgKm1zd3UwQETYTYmrIX6 xCodMERQAKBg7CelZq4xTciVOpKq/Afdptki3kWlLq7NfLuHd9i8U7CsjgUiT/EmClZq QVS1jePX1j3xxnie/BivOKLRga5AvCOSvFM734kHX9sB2rJL7r+86Z7kwq75rlLmlhpL X2UnPWmTv90TICpTT3QTiPnFp+aJ/Q4aWNuThNJiNjh3DWA9xv8flXOOJ861rtJbG3xo 0H0Q== X-Gm-Message-State: AOJu0Yz0zEoFLtPba+4/62uXwkYq3c7LDQEmUJqhfshlErWCh4q+Ljqc VU63+r2xRsfpbKHol+jQIK1GgleYub8N0jvep2zSYREQnk0pJuZtYhzI4qWRMJ31jLR60oUFJmb i X-Received: by 2002:a17:902:d355:b0:1e4:c09:7f37 with SMTP id l21-20020a170902d35500b001e40c097f37mr2820033plk.54.1714749200975; Fri, 03 May 2024 08:13:20 -0700 (PDT) Received: from stoup.. (174-21-72-5.tukw.qwest.net. [174.21.72.5]) by smtp.gmail.com with ESMTPSA id p10-20020a170902c70a00b001e81c778784sm3366611plp.67.2024.05.03.08.13.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 May 2024 08:13:20 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: Alexander Monakov , Mikhail Romanov Subject: [PULL 05/10] util/bufferiszero: Optimize SSE2 and AVX2 variants Date: Fri, 3 May 2024 08:13:09 -0700 Message-Id: <20240503151314.336357-6-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240503151314.336357-1-richard.henderson@linaro.org> References: <20240503151314.336357-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::636; envelope-from=richard.henderson@linaro.org; helo=mail-pl1-x636.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Alexander Monakov Increase unroll factor in SIMD loops from 4x to 8x in order to move their bottlenecks from ALU port contention to load issue rate (two loads per cycle on popular x86 implementations). Avoid using out-of-bounds pointers in loop boundary conditions. Follow SSE2 implementation strategy in the AVX2 variant. Avoid use of PTEST, which is not profitable there (like in the removed SSE4 variant). Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov Reviewed-by: Richard Henderson Message-Id: <20240206204809.9859-6-amonakov@ispras.ru> --- util/bufferiszero.c | 111 +++++++++++++++++++++++++++++--------------- 1 file changed, 73 insertions(+), 38 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 00118d649e..02df82b4ff 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -67,62 +67,97 @@ static bool buffer_is_zero_integer(const void *buf, size_t len) #if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) #include -/* Note that each of these vectorized functions require len >= 64. */ +/* Helper for preventing the compiler from reassociating + chains of binary vector operations. */ +#define SSE_REASSOC_BARRIER(vec0, vec1) asm("" : "+x"(vec0), "+x"(vec1)) + +/* Note that these vectorized functions may assume len >= 256. */ static bool __attribute__((target("sse2"))) buffer_zero_sse2(const void *buf, size_t len) { - __m128i t = _mm_loadu_si128(buf); - __m128i *p = (__m128i *)(((uintptr_t)buf + 5 * 16) & -16); - __m128i *e = (__m128i *)(((uintptr_t)buf + len) & -16); - __m128i zero = _mm_setzero_si128(); + /* Unaligned loads at head/tail. */ + __m128i v = *(__m128i_u *)(buf); + __m128i w = *(__m128i_u *)(buf + len - 16); + /* Align head/tail to 16-byte boundaries. */ + const __m128i *p = QEMU_ALIGN_PTR_DOWN(buf + 16, 16); + const __m128i *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 16); + __m128i zero = { 0 }; - /* Loop over 16-byte aligned blocks of 64. */ - while (likely(p <= e)) { - t = _mm_cmpeq_epi8(t, zero); - if (unlikely(_mm_movemask_epi8(t) != 0xFFFF)) { + /* Collect a partial block at tail end. */ + v |= e[-1]; w |= e[-2]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-3]; w |= e[-4]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-5]; w |= e[-6]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-7]; v |= w; + + /* + * Loop over complete 128-byte blocks. + * With the head and tail removed, e - p >= 14, so the loop + * must iterate at least once. + */ + do { + v = _mm_cmpeq_epi8(v, zero); + if (unlikely(_mm_movemask_epi8(v) != 0xFFFF)) { return false; } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } + v = p[0]; w = p[1]; + SSE_REASSOC_BARRIER(v, w); + v |= p[2]; w |= p[3]; + SSE_REASSOC_BARRIER(v, w); + v |= p[4]; w |= p[5]; + SSE_REASSOC_BARRIER(v, w); + v |= p[6]; w |= p[7]; + SSE_REASSOC_BARRIER(v, w); + v |= w; + p += 8; + } while (p < e - 7); - /* Finish the aligned tail. */ - t |= e[-3]; - t |= e[-2]; - t |= e[-1]; - - /* Finish the unaligned tail. */ - t |= _mm_loadu_si128(buf + len - 16); - - return _mm_movemask_epi8(_mm_cmpeq_epi8(t, zero)) == 0xFFFF; + return _mm_movemask_epi8(_mm_cmpeq_epi8(v, zero)) == 0xFFFF; } #ifdef CONFIG_AVX2_OPT static bool __attribute__((target("avx2"))) buffer_zero_avx2(const void *buf, size_t len) { - /* Begin with an unaligned head of 32 bytes. */ - __m256i t = _mm256_loadu_si256(buf); - __m256i *p = (__m256i *)(((uintptr_t)buf + 5 * 32) & -32); - __m256i *e = (__m256i *)(((uintptr_t)buf + len) & -32); + /* Unaligned loads at head/tail. */ + __m256i v = *(__m256i_u *)(buf); + __m256i w = *(__m256i_u *)(buf + len - 32); + /* Align head/tail to 32-byte boundaries. */ + const __m256i *p = QEMU_ALIGN_PTR_DOWN(buf + 32, 32); + const __m256i *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 32); + __m256i zero = { 0 }; - /* Loop over 32-byte aligned blocks of 128. */ - while (p <= e) { - if (unlikely(!_mm256_testz_si256(t, t))) { + /* Collect a partial block at tail end. */ + v |= e[-1]; w |= e[-2]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-3]; w |= e[-4]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-5]; w |= e[-6]; + SSE_REASSOC_BARRIER(v, w); + v |= e[-7]; v |= w; + + /* Loop over complete 256-byte blocks. */ + for (; p < e - 7; p += 8) { + /* PTEST is not profitable here. */ + v = _mm256_cmpeq_epi8(v, zero); + if (unlikely(_mm256_movemask_epi8(v) != 0xFFFFFFFF)) { return false; } - t = p[-4] | p[-3] | p[-2] | p[-1]; - p += 4; - } ; + v = p[0]; w = p[1]; + SSE_REASSOC_BARRIER(v, w); + v |= p[2]; w |= p[3]; + SSE_REASSOC_BARRIER(v, w); + v |= p[4]; w |= p[5]; + SSE_REASSOC_BARRIER(v, w); + v |= p[6]; w |= p[7]; + SSE_REASSOC_BARRIER(v, w); + v |= w; + } - /* Finish the last block of 128 unaligned. */ - t |= _mm256_loadu_si256(buf + len - 4 * 32); - t |= _mm256_loadu_si256(buf + len - 3 * 32); - t |= _mm256_loadu_si256(buf + len - 2 * 32); - t |= _mm256_loadu_si256(buf + len - 1 * 32); - - return _mm256_testz_si256(t, t); + return _mm256_movemask_epi8(_mm256_cmpeq_epi8(v, zero)) == 0xFFFFFFFF; } #endif /* CONFIG_AVX2_OPT */ From patchwork Fri May 3 15:13:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 794391 Delivered-To: patch@linaro.org Received: by 2002:adf:a153:0:b0:34d:5089:5a9e with SMTP id r19csp789398wrr; Fri, 3 May 2024 08:15:28 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXbmXuxAMCDK8jhksJJmMeWTkztlMzKDPez9LqmRQEsBw75TCVUCGN43ikplpNfzicNjL4YoZ5Wd5OGUNJCkkKK X-Google-Smtp-Source: AGHT+IG4h5WCHFsJsQG7w77FlypujjgWK4WFltNZmRnTFgTif9RBCdS9gSKhrItZfEv1GHoWtKy8 X-Received: by 2002:a05:6214:29ca:b0:6a0:cb52:b065 with SMTP id gh10-20020a05621429ca00b006a0cb52b065mr3036639qvb.32.1714749328729; Fri, 03 May 2024 08:15:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714749328; cv=none; d=google.com; s=arc-20160816; b=I0oHVtjcDq0xTQQliX/FxjaGodoUuoFQgdJ67tNtdyIpsRMt4eFOKmF3jrPqDNQf3n eYfVXflI2A//4xOUwbC0UL9x7yXb3NWihLLT21KSeCWCv7v1yZpk3iH3/aKyvCEBMyBf dmnWRghlM5Rj6vMDHst3/JWG6FUMqJXYzS95D9cFSkiwWqkz9rhQsew/2jAhlWHseHCQ 8RJWVKEl3tGKdKbCr1XE4vL2j6yVh+x/0RuoXy+CRS/hyKKxwLkackrXzjCj8gAN/SnH oV2vOOJFKni9gPKH9iBlrFpqCvWMpQUEwzH/PuO/F29U7KdP3oaVhDe3SZ0J+RWhA/OU AwJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=0Qmmrii1MrzzFxsfdpzzNrItPzYs13dw3hkqKeZdH+k=; fh=kQadAhxV42GJ3cd2VUw1RyURv/zJQGvmUdF7iLEVTtc=; b=HndVPyq4owxC1P2YlI8El50hcXna+Kf2z3Hbf/DOGBx85ASbCESVVgk+uN7FwN6wk/ 5YDC+bmD9mxwDTS5zQt5yeYpHIEwbhvp8kMJxEyusbOAQdymfEqL1PJZRdjo4looxRCt 5q/rz6adYUthrVBLxR8SsXEBBmmDWDSUCBL0xWwSiE7W4IQyu8dHKRmrqaMlmdCHRAKO O/0/+0XF8+F7ENEsFp86648Y3M6dRPMM4DxXltaDWfaEpoWMR+nPnnFMmP6dU7RZqnrd UMZ4FROIuUB5WHHrSLjnm7onbDiks+yJr+xuBkag/0C81JiDOzDjN+2pNtpFxzvmjH8A snRw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=m5CVamUL; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id gf7-20020a056214250700b006a0cc1ce712si3410447qvb.75.2024.05.03.08.15.28 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 03 May 2024 08:15:28 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=m5CVamUL; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s2ubK-0005p7-Et; Fri, 03 May 2024 11:13:30 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s2ubF-0005k3-JC for qemu-devel@nongnu.org; Fri, 03 May 2024 11:13:25 -0400 Received: from mail-pl1-x633.google.com ([2607:f8b0:4864:20::633]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s2ubD-00075t-F0 for qemu-devel@nongnu.org; Fri, 03 May 2024 11:13:25 -0400 Received: by mail-pl1-x633.google.com with SMTP id d9443c01a7336-1e4266673bbso83928555ad.2 for ; Fri, 03 May 2024 08:13:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1714749202; x=1715354002; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0Qmmrii1MrzzFxsfdpzzNrItPzYs13dw3hkqKeZdH+k=; b=m5CVamULNL0Iq0zx3tSiMVyDhEbrlQOmY+VTXVlQCFgM1wzefXWNDycNlbEmOhDzWk cvmT2yWUdEvlfxAu1CNH+kg/SU6jISxI4bbhHYLWWV2NjtU1B8BVZlUeii4jbhGGYmc5 L127PivDbugFrfFykK32L2U2CxumYkQj9vSWEnwYpZjUtQS9blkS5OKlElzoGJaVZkri PNf09XpLAynaePqgbFisF1rYP+45GYm5nH+WGHxj+QSFk5U8R/PdTJxJnnSrA3ArdNPy 9jDoclts6U354Vi2wE8qj66exl1Smms7bF5aiFk8GdF4RY90WoJ5yX4BA6LQI3rDTz5Q YD+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714749202; x=1715354002; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0Qmmrii1MrzzFxsfdpzzNrItPzYs13dw3hkqKeZdH+k=; b=k8DbXCRZpAHBUH72SEw7LXRPlMkHIZcUvym2Mnzc/Wyd0xpIB3AW/PuOb66Tj91bdZ M+XKJIfG3h2Xr/udV/5i4nkthGi9QfSHsNkGCLssaI9qk1SUE7Pxke+tclX17yYQEfTb StCo2OsAgjaOFS5ZPH4ZOOsW9+AD26g4oKSeqp4Y/QXyzEUDLCgAA/N45ZiVYjuaFPU0 qyDBgkOuM6ZHv5Rc5rUzWL1W3eVyb0zkmQMCapEnYX//3SQm0BPnAwSR729hcVwazxgg +jRUIokPan2DclDZAIS9XXFt3J9N/gYQZl1hZmJ0NxBFftQqEKawZ+8RKM1bgikxoovW yviQ== X-Gm-Message-State: AOJu0YzXksEo51GwTNdCxZnHkuvq34ZkM/x3l2P3qiabrc3bH9PJnhTf 66OAWiP1VHBjJIKZf/xJiD3BgtUohdb9MGoX914tD2j1vucvS9Zalm2/OYc1YIB3dvwoMnsV1Ae d X-Received: by 2002:a17:903:2282:b0:1e0:c0dd:c5eb with SMTP id b2-20020a170903228200b001e0c0ddc5ebmr3482945plh.9.1714749201813; Fri, 03 May 2024 08:13:21 -0700 (PDT) Received: from stoup.. (174-21-72-5.tukw.qwest.net. [174.21.72.5]) by smtp.gmail.com with ESMTPSA id p10-20020a170902c70a00b001e81c778784sm3366611plp.67.2024.05.03.08.13.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 May 2024 08:13:21 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= Subject: [PULL 06/10] util/bufferiszero: Improve scalar variant Date: Fri, 3 May 2024 08:13:10 -0700 Message-Id: <20240503151314.336357-7-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240503151314.336357-1-richard.henderson@linaro.org> References: <20240503151314.336357-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::633; envelope-from=richard.henderson@linaro.org; helo=mail-pl1-x633.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Split less-than and greater-than 256 cases. Use unaligned accesses for head and tail. Avoid using out-of-bounds pointers in loop boundary conditions. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- util/bufferiszero.c | 85 +++++++++++++++++++++++++++------------------ 1 file changed, 51 insertions(+), 34 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 02df82b4ff..c9a7ded016 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -28,40 +28,57 @@ static bool (*buffer_is_zero_accel)(const void *, size_t); -static bool buffer_is_zero_integer(const void *buf, size_t len) +static bool buffer_is_zero_int_lt256(const void *buf, size_t len) { - if (unlikely(len < 8)) { - /* For a very small buffer, simply accumulate all the bytes. */ - const unsigned char *p = buf; - const unsigned char *e = buf + len; - unsigned char t = 0; + uint64_t t; + const uint64_t *p, *e; - do { - t |= *p++; - } while (p < e); - - return t == 0; - } else { - /* Otherwise, use the unaligned memory access functions to - handle the beginning and end of the buffer, with a couple - of loops handling the middle aligned section. */ - uint64_t t = ldq_he_p(buf); - const uint64_t *p = (uint64_t *)(((uintptr_t)buf + 8) & -8); - const uint64_t *e = (uint64_t *)(((uintptr_t)buf + len) & -8); - - for (; p + 8 <= e; p += 8) { - if (t) { - return false; - } - t = p[0] | p[1] | p[2] | p[3] | p[4] | p[5] | p[6] | p[7]; - } - while (p < e) { - t |= *p++; - } - t |= ldq_he_p(buf + len - 8); - - return t == 0; + /* + * Use unaligned memory access functions to handle + * the beginning and end of the buffer. + */ + if (unlikely(len <= 8)) { + return (ldl_he_p(buf) | ldl_he_p(buf + len - 4)) == 0; } + + t = ldq_he_p(buf) | ldq_he_p(buf + len - 8); + p = QEMU_ALIGN_PTR_DOWN(buf + 8, 8); + e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 8); + + /* Read 0 to 31 aligned words from the middle. */ + while (p < e) { + t |= *p++; + } + return t == 0; +} + +static bool buffer_is_zero_int_ge256(const void *buf, size_t len) +{ + /* + * Use unaligned memory access functions to handle + * the beginning and end of the buffer. + */ + uint64_t t = ldq_he_p(buf) | ldq_he_p(buf + len - 8); + const uint64_t *p = QEMU_ALIGN_PTR_DOWN(buf + 8, 8); + const uint64_t *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 8); + + /* Collect a partial block at the tail end. */ + t |= e[-7] | e[-6] | e[-5] | e[-4] | e[-3] | e[-2] | e[-1]; + + /* + * Loop over 64 byte blocks. + * With the head and tail removed, e - p >= 30, + * so the loop must iterate at least 3 times. + */ + do { + if (t) { + return false; + } + t = p[0] | p[1] | p[2] | p[3] | p[4] | p[5] | p[6] | p[7]; + p += 8; + } while (p < e - 7); + + return t == 0; } #if defined(CONFIG_AVX2_OPT) || defined(__SSE2__) @@ -173,7 +190,7 @@ select_accel_cpuinfo(unsigned info) { CPUINFO_AVX2, buffer_zero_avx2 }, #endif { CPUINFO_SSE2, buffer_zero_sse2 }, - { CPUINFO_ALWAYS, buffer_is_zero_integer }, + { CPUINFO_ALWAYS, buffer_is_zero_int_ge256 }, }; for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { @@ -211,7 +228,7 @@ bool test_buffer_is_zero_next_accel(void) return false; } -#define INIT_ACCEL buffer_is_zero_integer +#define INIT_ACCEL buffer_is_zero_int_ge256 #endif static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL; @@ -232,7 +249,7 @@ bool buffer_is_zero_ool(const void *buf, size_t len) if (likely(len >= 256)) { return buffer_is_zero_accel(buf, len); } - return buffer_is_zero_integer(buf, len); + return buffer_is_zero_int_lt256(buf, len); } bool buffer_is_zero_ge256(const void *buf, size_t len) From patchwork Fri May 3 15:13:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 794383 Delivered-To: patch@linaro.org Received: by 2002:adf:a153:0:b0:34d:5089:5a9e with SMTP id r19csp788543wrr; Fri, 3 May 2024 08:14:00 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWaDfZiJBZ1/DkOjNXV9xQkwjy5DsSgvVB+UGOwBZyUXz+ngr6YgkPh7ymK5wVGsENk/gvIruTLMRFMsKTq1nH/ X-Google-Smtp-Source: AGHT+IFcNnNG4HUJlZTWJmf/6r7cT97fMQGp88Bk6si8rmkhtZs+3S+lnusrHJMgAZ6hv33se37n X-Received: by 2002:a05:6870:5d88:b0:222:7000:8b28 with SMTP id fu8-20020a0568705d8800b0022270008b28mr3427954oab.32.1714749239766; Fri, 03 May 2024 08:13:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714749239; cv=none; d=google.com; s=arc-20160816; b=z7K6oUXcH2sKHSlM8Y6KDp7tnmjP1e86rUm1IqTvK+5fMEL9kvIwveDAKHu78gdgCW bxKCsviaW9uk3ayPOE3DFBjz/605tdRy0HOwGwg0gTZKnIf2WF2SBU723ZwOa+qO+8cu v6scBm/ry42fX4JRMefBlw8g2HkpH4xkJjSQkjihRCAmjJOguEzGn4ijwraxfumqWiRF v6WAP9ZfXZSBmEhEZErY9lfE6I/5dtpGEuhnwgrX3JZ6jSxZ+n+TLHMPCgRUomFF+gTF mzx2KvtA6vEqOqxsLzrUtAuNHr2tGLNUC/pzIioha0JDkbqS5+xgCi+ONgJd/qk68Jj3 uZpg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=oO6lVL+LVK6xULV3eqZmGPXjUDThyU0i78qX8xOIR9w=; fh=kQadAhxV42GJ3cd2VUw1RyURv/zJQGvmUdF7iLEVTtc=; b=RLJ2ieB5vTJXxK4EZibasmwLvzJZnj70qg7+SyKSa+UlIvYXaADMk9MjjmFvwKVQuG eqggp/lbxxV8YPANCeWuL0zSBdEx++c90yG7ZsycycA2lDds+370+wrEeW2GUKvAr+4s 3OwnuKrYJLwfYorMRen8ksEbor1DLOLBNNJ/pelJX43OD0qtNXil5yFCa1Y9/h037QlM SAVbRkQBYNOd7cG2TvXlNOUXD+Meceuapu7V7w4VtSG1EtBleohck2+zlS/QcmgO+0yC BGlCebrQiiyLhaePfBLGOF3e5QE6FyfCLdSwnXqZu46R5FUC+WJ99qfpZGTVpjE0GFk4 hBGQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="Hwj/a+k2"; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id o21-20020a05620a229500b007906a37c7a2si3174044qkh.343.2024.05.03.08.13.59 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 03 May 2024 08:13:59 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="Hwj/a+k2"; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s2ubJ-0005nR-1I; Fri, 03 May 2024 11:13:29 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s2ubG-0005kb-72 for qemu-devel@nongnu.org; Fri, 03 May 2024 11:13:26 -0400 Received: from mail-pl1-x633.google.com ([2607:f8b0:4864:20::633]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s2ubE-00076G-Jz for qemu-devel@nongnu.org; Fri, 03 May 2024 11:13:25 -0400 Received: by mail-pl1-x633.google.com with SMTP id d9443c01a7336-1ec69e3dbcfso34037485ad.0 for ; Fri, 03 May 2024 08:13:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1714749202; x=1715354002; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=oO6lVL+LVK6xULV3eqZmGPXjUDThyU0i78qX8xOIR9w=; b=Hwj/a+k2/t2qsCutDjeU8sVsgHzlmzSNdDJTFESksbdthyTduPeEu571hUjxcqAuFw fFoBy1v5DWJvrvWX8Fu8tpa0h/TOfugqdI7CQxHhviRLn4ZSSPsi79zvx90FQKRicrJm fdAaG/+MPFrN68BIy0UmEmRR23EMlglTUpnpci05rrAeMgNt2e7+gx/339MVcqciJbHf LvGFh6gK8d8trMCWYSfxn42Z740sLSH0WSwCZzxMP4Z7N7vmD+EsuIVERUammMaOALvG dmrOKosXP1D6o2sMNFWm4KNwZSqAiiF5A0UBe936aUTfx1kSZYUPLWZuDqFKXO8CTZdm hfDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714749202; x=1715354002; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oO6lVL+LVK6xULV3eqZmGPXjUDThyU0i78qX8xOIR9w=; b=VPK3Mkgd1FuU4keuHpngLV21NUEBA9PM0EEbQD6i4ltiJQj5pmiwSyleQCRIEotA/D NmfKmBV6WGuAxYKGYsdPRK2TspZwsP6nUEFq61IvnE8Bt9OK5bqN/T9qYHe6LjHaY9JJ eD4PEGcAWD65iUMG+5UwPL2rtKOGjWq9JTAeWUyt/BkJVIWUPFI4+IW/aPXdLo+S7sLe S3cnPQu8GldkddsWdUZ/mP2udLnIwWNu1oqI6AE2jGPgQTGnHAEhc6uB6NesdMOlfxFi GVG7Cht3HRAp6lTCoysTq00aqQcdwLGsh632gALw1dygx0khf4mmkqx43qX0c0PKk+NN XHRg== X-Gm-Message-State: AOJu0YwssLnMLUElOdsjvx8AXh4fjrPo94dlY6FwvotBWxm0Cmt5teep uQ73zBQj+qrvm3Qlit9pDUtmfFmp6dIQ9d1RBqxUEsxETLK9si5L39L+jqmzccA/nIK0OIdzP6E w X-Received: by 2002:a17:902:da8d:b0:1e5:2883:6ff6 with SMTP id j13-20020a170902da8d00b001e528836ff6mr3582685plx.11.1714749202614; Fri, 03 May 2024 08:13:22 -0700 (PDT) Received: from stoup.. (174-21-72-5.tukw.qwest.net. [174.21.72.5]) by smtp.gmail.com with ESMTPSA id p10-20020a170902c70a00b001e81c778784sm3366611plp.67.2024.05.03.08.13.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 May 2024 08:13:22 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= Subject: [PULL 07/10] util/bufferiszero: Introduce biz_accel_fn typedef Date: Fri, 3 May 2024 08:13:11 -0700 Message-Id: <20240503151314.336357-8-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240503151314.336357-1-richard.henderson@linaro.org> References: <20240503151314.336357-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::633; envelope-from=richard.henderson@linaro.org; helo=mail-pl1-x633.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- util/bufferiszero.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index c9a7ded016..f9af7841ba 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -26,7 +26,8 @@ #include "qemu/bswap.h" #include "host/cpuinfo.h" -static bool (*buffer_is_zero_accel)(const void *, size_t); +typedef bool (*biz_accel_fn)(const void *, size_t); +static biz_accel_fn buffer_is_zero_accel; static bool buffer_is_zero_int_lt256(const void *buf, size_t len) { @@ -184,7 +185,7 @@ select_accel_cpuinfo(unsigned info) /* Array is sorted in order of algorithm preference. */ static const struct { unsigned bit; - bool (*fn)(const void *, size_t); + biz_accel_fn fn; } all[] = { #ifdef CONFIG_AVX2_OPT { CPUINFO_AVX2, buffer_zero_avx2 }, @@ -231,7 +232,7 @@ bool test_buffer_is_zero_next_accel(void) #define INIT_ACCEL buffer_is_zero_int_ge256 #endif -static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL; +static biz_accel_fn buffer_is_zero_accel = INIT_ACCEL; bool buffer_is_zero_ool(const void *buf, size_t len) { From patchwork Fri May 3 15:13:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 794381 Delivered-To: patch@linaro.org Received: by 2002:adf:a153:0:b0:34d:5089:5a9e with SMTP id r19csp788396wrr; Fri, 3 May 2024 08:13:42 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCWnDABdXjKtnF93A6WxWpboQIF3gdeJJzJw5G39Pl6x0snKJC6eRaLlNma0ZG8hKNbUcAToRhdwLuAHRI2oEleW X-Google-Smtp-Source: AGHT+IFbJvEYK0c4nAgKYTXcS+XmkCxB4P4IpZw+4i41XdjB8I9p1WKflW1MuqrUjwzgNbAizAJw X-Received: by 2002:a05:620a:1a0e:b0:78d:6abb:947 with SMTP id bk14-20020a05620a1a0e00b0078d6abb0947mr3886980qkb.35.1714749222494; Fri, 03 May 2024 08:13:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714749222; cv=none; d=google.com; s=arc-20160816; b=f14vCr5PuE6h19svgxYwgb3px9TEls03TAJzmXrx1E1Ur6ctNhecvnMk33BgkXzzAn wRSEBHzzcYQf7CFP6PMQLm570vrBszWyQhoU5hwl5B1i7xZarTy7ot16MyI1uYK2pdQV agVZpl7NZfQoxEel9EicAuMiZvvYRRKNKyFSPBXp39LPoclENz9oceuMlB3rXNHbiETX X5F8JK7F/VmZJRvFtS2eo8fPg7ifnHu8ROqXJI20PRPIO7vrKQioe/Kt0NMrvVlzckmJ XKvCpiGBWBQPZzwk1gDbFTRbilz87SnKWQKU7AXLFqK6j9T2aKSEmk6RlhNZytnhEe3n 2BGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=j2qk0rBcBkKKFEgVV+rMjuUu0W/Dg2i9vyIUDLqamH4=; fh=kQadAhxV42GJ3cd2VUw1RyURv/zJQGvmUdF7iLEVTtc=; b=lvNCQJ732UZPGVUiY1XjNPkTo/MLyCdYHAk3iiLyFqIqUXCHxO9F8J/adZYmqoGBDT vc6cFi99DHf6Gw1YOHnzX7FYIUsbLr2MJP7yBJcL0JqEAYAbRCMnpEbOKcVwChNNWbAD m5p6q55YQg9IF2/b/WQJv1pslN8ckWIZ7D4Bn9ACjMDM8+mccFsNeUFQh1DNEuMhl9Uf pfJdcYuFAwxG7TfCLqO74UYpQjkoS98sMW3Ka47fsNsnsbLZEL0beOv20Rhcn608fyGT gwnGphDsO/GlYvlrRHH2q36HnTtabXuKa8sxrjMSMLXd3STUhQLuH2E6rRQu8LOGjAwd l8CQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=BIE5RkmN; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id d24-20020a05620a159800b0078d699ba683si3295261qkk.635.2024.05.03.08.13.42 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 03 May 2024 08:13:42 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=BIE5RkmN; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s2ubJ-0005oF-8H; Fri, 03 May 2024 11:13:29 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s2ubG-0005lD-Es for qemu-devel@nongnu.org; Fri, 03 May 2024 11:13:26 -0400 Received: from mail-pl1-x631.google.com ([2607:f8b0:4864:20::631]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s2ubE-00076W-OT for qemu-devel@nongnu.org; Fri, 03 May 2024 11:13:26 -0400 Received: by mail-pl1-x631.google.com with SMTP id d9443c01a7336-1e3ca546d40so80478805ad.3 for ; Fri, 03 May 2024 08:13:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1714749203; x=1715354003; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=j2qk0rBcBkKKFEgVV+rMjuUu0W/Dg2i9vyIUDLqamH4=; b=BIE5RkmNEdnNaZkd0yv5E3GlkPOEyXoWOE+xt/mAEXv8pC90L7FFw+uFZtMg2BENaX 2RumZrpG65+41G14yWfkJm8MFNuyEnLXgZjQCST8u7KoZyZ2T0fYHAawLUTTSnT+pcXp INNq0ynkd6Wl4xikOxfwXQaRxxbm6vabRYz7dag+RnzKllCvlwzKEubxOmVOVWfx8B4w +MYEFTvmBLFLwaPSTymwpv1cc+kjveuQHSDUzROingC0zx+KwBnR9MubSwF5MgBgftWO u6MTDKWKRIcQo/hShmQJg/DaEeRdSqx3fX0QIvIYQ1Y0Cwh4hYH6iEJSpL8G0maob8Ie trbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714749203; x=1715354003; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=j2qk0rBcBkKKFEgVV+rMjuUu0W/Dg2i9vyIUDLqamH4=; b=qlshUi0hmz4MMxfSdEGJ/DVodBglGh4ImhmSfxKqr3Tsi1+JKDY2kdp5K1lTbpG/JG bR0YN8rNTMuP4OuKiEJAZz5/1vF9ujMT3CHTNJZPHQmj4ZWTX9yRNJL7rIQnF6ClZbNJ L2sx4c04YBLZfzUX1LdyoqDm20JGrMdSvw9zEMDxWAzhvuMzOhymmSQY9KR7qLxvlcgg ewMQ3y0RIj1cko4uMR31eQKB7URR1hh0pCK2MUqEZ9axAnWCM3TaYgM9sMOFagto2b7+ 4jOLBuJAr8k/k31fEwi8p/iSDjyLHT83dXZUV+4VPSHSS4JMo3yox8e3+9DKv5HbPt4k FzXg== X-Gm-Message-State: AOJu0YzbB5uc887Babp+1oKhQvQ8ePegTsi6NxQ60KyL1sbMGXBP2GXK RnnRwhfHQlqC3yne+LomoDmfP5LUTFn8QoxiImfNzUVD4vFF098hagpa6+/42sYXG8De40oZUlG W X-Received: by 2002:a17:903:11d0:b0:1e5:5bd7:87a4 with SMTP id q16-20020a17090311d000b001e55bd787a4mr3582066plh.16.1714749203332; Fri, 03 May 2024 08:13:23 -0700 (PDT) Received: from stoup.. (174-21-72-5.tukw.qwest.net. [174.21.72.5]) by smtp.gmail.com with ESMTPSA id p10-20020a170902c70a00b001e81c778784sm3366611plp.67.2024.05.03.08.13.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 May 2024 08:13:22 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= Subject: [PULL 08/10] util/bufferiszero: Simplify test_buffer_is_zero_next_accel Date: Fri, 3 May 2024 08:13:12 -0700 Message-Id: <20240503151314.336357-9-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240503151314.336357-1-richard.henderson@linaro.org> References: <20240503151314.336357-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::631; envelope-from=richard.henderson@linaro.org; helo=mail-pl1-x631.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Because the three alternatives are monotonic, we don't need to keep a couple of bitmasks, just identify the strongest alternative at startup. Generalize test_buffer_is_zero_next_accel and init_accel by always defining an accel_table array. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- util/bufferiszero.c | 81 ++++++++++++++++++++------------------------- 1 file changed, 35 insertions(+), 46 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index f9af7841ba..7218154a13 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -27,7 +27,6 @@ #include "host/cpuinfo.h" typedef bool (*biz_accel_fn)(const void *, size_t); -static biz_accel_fn buffer_is_zero_accel; static bool buffer_is_zero_int_lt256(const void *buf, size_t len) { @@ -179,60 +178,35 @@ buffer_zero_avx2(const void *buf, size_t len) } #endif /* CONFIG_AVX2_OPT */ -static unsigned __attribute__((noinline)) -select_accel_cpuinfo(unsigned info) -{ - /* Array is sorted in order of algorithm preference. */ - static const struct { - unsigned bit; - biz_accel_fn fn; - } all[] = { +static biz_accel_fn const accel_table[] = { + buffer_is_zero_int_ge256, + buffer_zero_sse2, #ifdef CONFIG_AVX2_OPT - { CPUINFO_AVX2, buffer_zero_avx2 }, + buffer_zero_avx2, #endif - { CPUINFO_SSE2, buffer_zero_sse2 }, - { CPUINFO_ALWAYS, buffer_is_zero_int_ge256 }, - }; +}; - for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) { - if (info & all[i].bit) { - buffer_is_zero_accel = all[i].fn; - return all[i].bit; - } +static unsigned best_accel(void) +{ + unsigned info = cpuinfo_init(); + +#ifdef CONFIG_AVX2_OPT + if (info & CPUINFO_AVX2) { + return 2; } - return 0; +#endif + return info & CPUINFO_SSE2 ? 1 : 0; } -static unsigned used_accel; - -static void __attribute__((constructor)) init_accel(void) -{ - used_accel = select_accel_cpuinfo(cpuinfo_init()); -} - -#define INIT_ACCEL NULL - -bool test_buffer_is_zero_next_accel(void) -{ - /* - * Accumulate the accelerators that we've already tested, and - * remove them from the set to test this round. We'll get back - * a zero from select_accel_cpuinfo when there are no more. - */ - unsigned used = select_accel_cpuinfo(cpuinfo & ~used_accel); - used_accel |= used; - return used; -} #else -bool test_buffer_is_zero_next_accel(void) -{ - return false; -} - -#define INIT_ACCEL buffer_is_zero_int_ge256 +#define best_accel() 0 +static biz_accel_fn const accel_table[1] = { + buffer_is_zero_int_ge256 +}; #endif -static biz_accel_fn buffer_is_zero_accel = INIT_ACCEL; +static biz_accel_fn buffer_is_zero_accel; +static unsigned accel_index; bool buffer_is_zero_ool(const void *buf, size_t len) { @@ -257,3 +231,18 @@ bool buffer_is_zero_ge256(const void *buf, size_t len) { return buffer_is_zero_accel(buf, len); } + +bool test_buffer_is_zero_next_accel(void) +{ + if (accel_index != 0) { + buffer_is_zero_accel = accel_table[--accel_index]; + return true; + } + return false; +} + +static void __attribute__((constructor)) init_accel(void) +{ + accel_index = best_accel(); + buffer_is_zero_accel = accel_table[accel_index]; +} From patchwork Fri May 3 15:13:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 794388 Delivered-To: patch@linaro.org Received: by 2002:adf:a153:0:b0:34d:5089:5a9e with SMTP id r19csp788820wrr; Fri, 3 May 2024 08:14:29 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCV5nTO1RiK4rVjglWy+ph0aFqAx6Z47v05O70MxZv9pDVngXZI3ATahwX7JHyM8f/h/0pJnnFbg94uBBe0pO79h X-Google-Smtp-Source: AGHT+IEOf78CuNeZG5dqTMpnlI2M31Hj77M4Y4TfMIKjDLUWjnj+f5or3jhVtaEUvKKIkX6xwAuZ X-Received: by 2002:ac8:7e84:0:b0:43a:c82b:eb1b with SMTP id w4-20020ac87e84000000b0043ac82beb1bmr3049121qtj.39.1714749269415; Fri, 03 May 2024 08:14:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714749269; cv=none; d=google.com; s=arc-20160816; b=O4s2nEO4Z038kplUYSxS597mWvx7mkQYIwe+rDf7AI6lM8B+8WF7AqsID4dQYUzfBW R1GlussdNSCYrKLmZCR2benR5a5guvxKWYamcYiYWeQVwUSDrTzyjPb1n6dcWHQZpClu +njWP4M0TJCNdJXj4+x8Uri7vYS+gww9t1O0o9ittgBbrA33c8+k+xNKmlZj4w2NXdB3 3uXdC8HJWpTscuDdaH830fy9aIjBAULnupeJp4EZpzTAsl6e17u3FP3JhBfFw5JEEWbi ztNDx5NIbC5ESnrJdTsouWyKxlefNpJ24QtldWwlEG2odr6ZtYfBpODM+vkZm4xVMI4P XUGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=GFUk4wDI0aciOtV8kFy/BP1LluK4qErzNDWj4G5aLoM=; fh=kQadAhxV42GJ3cd2VUw1RyURv/zJQGvmUdF7iLEVTtc=; b=mDFK7+KkaJcFwQlUSasUstC8Tm+LagGQGaugYu9AM/sqCo2sMoaNS/qaclNMp+6AEZ jGE2AgmC3Xk3q8d8/9d52xOX5mXglHA1h9KMwhDKb5QcrZFdMizI8n0/3lfubxdVCU5W Ni1btgnQWlFJxRPh8cXnT2/TOS3GDVnuZFeRB9eEwATaAKsg0yGNmD8DMVxOmoD0S6nr FMitnu4h5Ad4L1pt/jo5fDXk0YHRLshxaBelEuGc6n2Exh/3sEtXyjjQlvL/R2Xi29lE niWOYVVG34/PKKHkOEbCU//u6iV47YOG18uYdJByw+M+IMtvFkOai6mZrWDnY2vu+lLy kZfQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=kReMUPPR; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id e17-20020ac85991000000b0043c8e58e498si2536652qte.14.2024.05.03.08.14.29 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 03 May 2024 08:14:29 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=kReMUPPR; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s2ubL-0005pn-3g; Fri, 03 May 2024 11:13:31 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s2ubH-0005mq-1z for qemu-devel@nongnu.org; Fri, 03 May 2024 11:13:27 -0400 Received: from mail-pl1-x635.google.com ([2607:f8b0:4864:20::635]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s2ubF-00076k-Fp for qemu-devel@nongnu.org; Fri, 03 May 2024 11:13:26 -0400 Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-1ec92e355bfso30523395ad.3 for ; Fri, 03 May 2024 08:13:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1714749204; x=1715354004; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GFUk4wDI0aciOtV8kFy/BP1LluK4qErzNDWj4G5aLoM=; b=kReMUPPRgvFQzo7EhT8Um2sWNn506XqghEa1T+zqmXmqb9U/9HKPG14/jk47HH/Ty6 tMGNW/9LxnUGv1fzow/TKPujS8lbOoNPmjVzxVpa+UdsXlf33MT5kRG4pYskI8A4CGLE AbYa4+bvbfP/g2R7zFB5j2ucXa92x2GHE/+q80u55TmjWmmmytKBFFlzqyvQBueanYJ4 SbJAJ4EgFW5jSeY6jbAItvkylbh+9TS+BM2KijR3ep1y5hdN0XXrCgxrpTq15fdmCtsK hFDQRMFcVayfbtrmsOjODSTxZ9Qa43/wPduWbTVT0vrjpbR2YHtrTAPVkuHvJGd7Yoxq 59Rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714749204; x=1715354004; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GFUk4wDI0aciOtV8kFy/BP1LluK4qErzNDWj4G5aLoM=; b=horHJc/vkRnmqLtBLNHYr1WbceUvHShPS1K8G4q32uifTQJOOysQCfB2D+wdUDkDiw rmAgWYKLKffO17xbQWJ/W+YcwYhMsB4B0rQRBz1AP+N06xYJaZ1CMeU4Pj1veXGYyKfu St/zUO8mBj3iTFlYIBk2zT7Dc07OnOs3frp7oe821+gLPB0nxI76mBLXSIf2aJ2O8Yb5 gjmGh91SWIjE47gChyNu7fsvClSVuLgA/d9/wStlk3GoiAEidcOLfLWnBmciLscRYQo5 0Ec5xBSE44unZq/wGIuMnNRT6OlbVyDimY+KYEDQ8IEXNz0jcBud2va7ZIttOVut3mo2 ITvg== X-Gm-Message-State: AOJu0YwUwHnPVe6aDg/TFjf7f9nER0850hqji2cfwftg0mtbR9A8ZxAG tGV42gm5pwenq02pgNGmJdO8FHNQ3HA+q1j+PvQdhbwghPzw7DedFSo9A/bS3cCkrFzX6MiiyDz c X-Received: by 2002:a17:902:a617:b0:1eb:d914:64e4 with SMTP id u23-20020a170902a61700b001ebd91464e4mr2576353plq.32.1714749204282; Fri, 03 May 2024 08:13:24 -0700 (PDT) Received: from stoup.. (174-21-72-5.tukw.qwest.net. [174.21.72.5]) by smtp.gmail.com with ESMTPSA id p10-20020a170902c70a00b001e81c778784sm3366611plp.67.2024.05.03.08.13.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 May 2024 08:13:23 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= Subject: [PULL 09/10] util/bufferiszero: Add simd acceleration for aarch64 Date: Fri, 3 May 2024 08:13:13 -0700 Message-Id: <20240503151314.336357-10-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240503151314.336357-1-richard.henderson@linaro.org> References: <20240503151314.336357-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::635; envelope-from=richard.henderson@linaro.org; helo=mail-pl1-x635.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely double-check with the compiler flags for __ARM_NEON and don't bother with a runtime check. Otherwise, model the loop after the x86 SSE2 function. Use UMAXV for the vector reduction. This is 3 cycles on cortex-a76 and 2 cycles on neoverse-n1. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- util/bufferiszero.c | 67 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index 7218154a13..74864f7b78 100644 --- a/util/bufferiszero.c +++ b/util/bufferiszero.c @@ -198,6 +198,73 @@ static unsigned best_accel(void) return info & CPUINFO_SSE2 ? 1 : 0; } +#elif defined(__aarch64__) && defined(__ARM_NEON) +#include + +/* + * Helper for preventing the compiler from reassociating + * chains of binary vector operations. + */ +#define REASSOC_BARRIER(vec0, vec1) asm("" : "+w"(vec0), "+w"(vec1)) + +static bool buffer_is_zero_simd(const void *buf, size_t len) +{ + uint32x4_t t0, t1, t2, t3; + + /* Align head/tail to 16-byte boundaries. */ + const uint32x4_t *p = QEMU_ALIGN_PTR_DOWN(buf + 16, 16); + const uint32x4_t *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 16); + + /* Unaligned loads at head/tail. */ + t0 = vld1q_u32(buf) | vld1q_u32(buf + len - 16); + + /* Collect a partial block at tail end. */ + t1 = e[-7] | e[-6]; + t2 = e[-5] | e[-4]; + t3 = e[-3] | e[-2]; + t0 |= e[-1]; + REASSOC_BARRIER(t0, t1); + REASSOC_BARRIER(t2, t3); + t0 |= t1; + t2 |= t3; + REASSOC_BARRIER(t0, t2); + t0 |= t2; + + /* + * Loop over complete 128-byte blocks. + * With the head and tail removed, e - p >= 14, so the loop + * must iterate at least once. + */ + do { + /* + * Reduce via UMAXV. Whatever the actual result, + * it will only be zero if all input bytes are zero. + */ + if (unlikely(vmaxvq_u32(t0) != 0)) { + return false; + } + + t0 = p[0] | p[1]; + t1 = p[2] | p[3]; + t2 = p[4] | p[5]; + t3 = p[6] | p[7]; + REASSOC_BARRIER(t0, t1); + REASSOC_BARRIER(t2, t3); + t0 |= t1; + t2 |= t3; + REASSOC_BARRIER(t0, t2); + t0 |= t2; + p += 8; + } while (p < e - 7); + + return vmaxvq_u32(t0) == 0; +} + +#define best_accel() 1 +static biz_accel_fn const accel_table[] = { + buffer_is_zero_int_ge256, + buffer_is_zero_simd, +}; #else #define best_accel() 0 static biz_accel_fn const accel_table[1] = { From patchwork Fri May 3 15:13:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 794386 Delivered-To: patch@linaro.org Received: by 2002:adf:a153:0:b0:34d:5089:5a9e with SMTP id r19csp788780wrr; Fri, 3 May 2024 08:14:24 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVYKUENKN1UolSNq7Ek1ml6YiVhopKfhpPenc+0sUHgnfWC0CtONYY3bl0sFxtTjoCmItoe2b2eRzSoL7Yd3cBp X-Google-Smtp-Source: AGHT+IEcH/RcxdGkOCBeceKTyHfi0hDiEEAaYtsW9HMvwX1ZIGav+3e8pm6LKeP/mGtlXB5uq0IY X-Received: by 2002:a05:622a:587:b0:43a:7c0d:8921 with SMTP id c7-20020a05622a058700b0043a7c0d8921mr3156802qtb.53.1714749264467; Fri, 03 May 2024 08:14:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714749264; cv=none; d=google.com; s=arc-20160816; b=XRbilIVP7vbYXFHbIdDzOFLEhXmgmwczsxzL7KZPOEmUOE+q/hRQy17e0zNrnfITh8 9qVvikBzKKjdN14O0MsgFVpI9djM0jC1vC/AwSXpN+JJ3T7CfMCbHRZ34cDe1sgsNAcJ 6oTsf6ZDI4LegMfPScPCwNhJxDJbolGPB3x/nLhQY62tPrO7Y8RICErBdkJ9Wrroen1d 6lUs+k42/yTmzGCh1J0LTDBkxXfEaFcROR4t/ahtv7yZvPjXt1PwKcU5aduXViar+nlm IdRavKkR1rNNpW/afH5yyzNvXY36GROqGdXVysag44T7rycmllbsGtNX8I1a8nMGHrg6 aHfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=P8aiuTj5/6yjo0wKZf1j2mdDbeTtUm906YMvC+PIZPk=; fh=kQadAhxV42GJ3cd2VUw1RyURv/zJQGvmUdF7iLEVTtc=; b=NxX2urN9TzMnuUKvnbQkzyIcoWOEevmgitPnXN4R5Dtx3QiyIx97IJs/5lNnUd4O8Y EvILYeWBLRYdZCnNicBMPztJrnB1WT5y7Dsk2xDx+AIy77APSPUz0eJ4ZaA8qUgYKU17 tsE56z5b4yHcAGbbytTNPzfyucRluG69N5yhmQNrxtNwuualXblMHpD1R4JgFd/Tm8n0 t2U8v/iNtnd0dXw8N0ejXUS+D8GMC+sDnL8iLLCgnj2TJkXD2sx2K/UNBIha1oK68h38 E3DD0wW9WJ4n3yppnhM9h3md1ALgsyJjmI/dG/E5nuxULb30E+XLkJrmWdR9NQq8Ax6J YIlA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=dhh5hm3E; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id u35-20020a05622a19a300b00434733fedafsi3607715qtc.456.2024.05.03.08.14.24 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 03 May 2024 08:14:24 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=dhh5hm3E; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s2ubM-0005q5-K1; Fri, 03 May 2024 11:13:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s2ubJ-0005oG-7E for qemu-devel@nongnu.org; Fri, 03 May 2024 11:13:29 -0400 Received: from mail-pl1-x631.google.com ([2607:f8b0:4864:20::631]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s2ubG-00076w-GU for qemu-devel@nongnu.org; Fri, 03 May 2024 11:13:28 -0400 Received: by mail-pl1-x631.google.com with SMTP id d9443c01a7336-1e3c3aa8938so62133365ad.1 for ; Fri, 03 May 2024 08:13:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1714749205; x=1715354005; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=P8aiuTj5/6yjo0wKZf1j2mdDbeTtUm906YMvC+PIZPk=; b=dhh5hm3Et4gSwin9L7A54VkF/R/ffpuO4p3Dc//5aC50N4P/UdcT9lqBSNU1dilSwZ ZVy7JwxyEg3sGyuBc7zlQosXJLBxfWUQ5hc+elmPN1Syan6FW8jDqziLvSABoFHrEwb6 aJPecNLL7u2751eyug6jrm+OnrXE88DTk7sZAgFLkqgthQCsIx0XqNBlx//Jk/j8DSar XKPzda8RBWfiljpCnolvZnOeNvYoWDsaNH7lzN3hZVfrnyXVBTlmjfnU95tybYiAAKDU e39UuchtouwvER9LuJR6iZUGdMCnx50AGez60PzNyuU+HmV/zaUVojqtmY7fh+eFPMOq FPUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714749205; x=1715354005; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=P8aiuTj5/6yjo0wKZf1j2mdDbeTtUm906YMvC+PIZPk=; b=ubG9fq0xT9tIvefD0g+JzFQ5unJfPM+lY/oZf3kAC+jJ1oLqyreisbErYnWe9Sl5cH ae7LhU3F8TK6qV8PxMbodTIIls9Sroo/sOfsyCx2a+RGikPJ+7HjCSRfx1TWUJ7vSTS2 vOnx+YXR4MJbzBh50vOkqZXlsDTWmCXGQsgAj4erPFdsyUkueKJ9tk6bo9rZU07PTNOr Hd/1weAAMsIeX+Wp5O0XUPqGIEmo44NI117x+bSgv//oXZ3g3FCh3tiOC2XoKjPd7iaf Jux48jjWeA72pEGtlGFovefsI0Y/8TSBuEwDeYh5fcb6lyZ61LPLFR3NMr6D7riG31Gb Yp/A== X-Gm-Message-State: AOJu0Yx81wGjS807cKKRmYLMnT34kKvIanDiMVEWkQVN8v20I1IdCFyH qdBT+CPA+PkLsx93dmVkUycdxfCNooO+J47adSU7wnL0ZwYT3iMyo3Q4EmcnjilaSFKXdkmzcbk h X-Received: by 2002:a17:902:cf04:b0:1ec:4054:9f47 with SMTP id i4-20020a170902cf0400b001ec40549f47mr3670436plg.26.1714749205079; Fri, 03 May 2024 08:13:25 -0700 (PDT) Received: from stoup.. (174-21-72-5.tukw.qwest.net. [174.21.72.5]) by smtp.gmail.com with ESMTPSA id p10-20020a170902c70a00b001e81c778784sm3366611plp.67.2024.05.03.08.13.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 May 2024 08:13:24 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Cc: =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= Subject: [PULL 10/10] tests/bench: Add bufferiszero-bench Date: Fri, 3 May 2024 08:13:14 -0700 Message-Id: <20240503151314.336357-11-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240503151314.336357-1-richard.henderson@linaro.org> References: <20240503151314.336357-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::631; envelope-from=richard.henderson@linaro.org; helo=mail-pl1-x631.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Benchmark each acceleration function vs an aligned buffer of zeros. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- tests/bench/bufferiszero-bench.c | 47 ++++++++++++++++++++++++++++++++ tests/bench/meson.build | 1 + 2 files changed, 48 insertions(+) create mode 100644 tests/bench/bufferiszero-bench.c diff --git a/tests/bench/bufferiszero-bench.c b/tests/bench/bufferiszero-bench.c new file mode 100644 index 0000000000..222695c1fa --- /dev/null +++ b/tests/bench/bufferiszero-bench.c @@ -0,0 +1,47 @@ +/* + * QEMU buffer_is_zero speed benchmark + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * (at your option) any later version. See the COPYING file in the + * top-level directory. + */ +#include "qemu/osdep.h" +#include "qemu/cutils.h" +#include "qemu/units.h" + +static void test(const void *opaque) +{ + size_t max = 64 * KiB; + void *buf = g_malloc0(max); + int accel_index = 0; + + do { + if (accel_index != 0) { + g_test_message("%s", ""); /* gnu_printf Werror for simple "" */ + } + for (size_t len = 1 * KiB; len <= max; len *= 4) { + double total = 0.0; + + g_test_timer_start(); + do { + buffer_is_zero_ge256(buf, len); + total += len; + } while (g_test_timer_elapsed() < 0.5); + + total /= MiB; + g_test_message("buffer_is_zero #%d: %2zuKB %8.0f MB/sec", + accel_index, len / (size_t)KiB, + total / g_test_timer_last()); + } + accel_index++; + } while (test_buffer_is_zero_next_accel()); + + g_free(buf); +} + +int main(int argc, char **argv) +{ + g_test_init(&argc, &argv, NULL); + g_test_add_data_func("/cutils/bufferiszero/speed", NULL, test); + return g_test_run(); +} diff --git a/tests/bench/meson.build b/tests/bench/meson.build index 7e76338a52..4cd7a2f6b5 100644 --- a/tests/bench/meson.build +++ b/tests/bench/meson.build @@ -21,6 +21,7 @@ benchs = {} if have_block benchs += { + 'bufferiszero-bench': [], 'benchmark-crypto-hash': [crypto], 'benchmark-crypto-hmac': [crypto], 'benchmark-crypto-cipher': [crypto],