[PULL,28/35] util/bufferiszero: Remove SSE4.1 variant

Message ID	20240408174929.862917-29-richard.henderson@linaro.org
State	Superseded
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Cc: Alexander Monakov <amonakov@ispras.ru>, Mikhail Romanov <mmromanov@ispras.ru> Subject: [PULL 28/35] util/bufferiszero: Remove SSE4.1 variant Date: Mon, 8 Apr 2024 07:49:22 -1000 Message-Id: <20240408174929.862917-29-richard.henderson@linaro.org> In-Reply-To: <20240408174929.862917-1-richard.henderson@linaro.org> References: <20240408174929.862917-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::52b; envelope-from=richard.henderson@linaro.org; helo=mail-pg1-x52b.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org
Series	[PULL,01/35] tcg/optimize: Do not attempt to constant fold neg_vec \| expand [PULL,01/35] tcg/optimize: Do not attempt to constant fold neg_vec [PULL,02/35] linux-user: Fix waitid return of siginfo_t and rusage [PULL,03/35] linux-user: do_setsockopt: fix SOL_ALG.ALG_SET_KEY [PULL,04/35] linux-user: do_setsockopt: make ip_mreq local to the place it is used and inline tar... [PULL,05/35] linux-user: do_setsockopt: make ip_mreq_source local to the place where it is used [PULL,06/35] linux-user: do_setsockopt: eliminate goto in switch for SO_SNDTIMEO [PULL,07/35] linux-user: Add FITRIM ioctl [PULL,08/35] linux-user: replace calloc() with g_new0() [PULL,09/35] target/hppa: Fix IIAOQ, IIASQ for pa2.0 [PULL,10/35] target/sh4: mac.w: memory accesses are 16-bit words [PULL,11/35] target/sh4: Merge mach and macl into a union [PULL,12/35] target/sh4: Fix mac.l with saturation enabled [PULL,13/35] target/sh4: Fix mac.w with saturation enabled [PULL,14/35] target/sh4: add missing CHECK_NOT_DELAY_SLOT [PULL,15/35] target/m68k: Map FPU exceptions to FPSR register [PULL,16/35] target/m68k: Pass semihosting arg to exit [PULL,17/35] target/m68k: Perform the semihosting test during translate [PULL,18/35] target/m68k: Support semihosting on non-ColdFire targets [PULL,19/35] tcg: Add TCGContext.emit_before_op [PULL,20/35] accel/tcg: Add insn_start to DisasContextBase [PULL,21/35] target/arm: Use insn_start from DisasContextBase [PULL,22/35] target/hppa: Use insn_start from DisasContextBase [PULL,23/35] target/i386: Preserve DisasContextBase.insn_start across rewind [PULL,24/35] target/microblaze: Use insn_start from DisasContextBase [PULL,25/35] target/riscv: Use insn_start from DisasContextBase [PULL,26/35] target/s390x: Use insn_start from DisasContextBase [PULL,27/35] accel/tcg: Improve can_do_io management [PULL,28/35] util/bufferiszero: Remove SSE4.1 variant [PULL,29/35] util/bufferiszero: Remove AVX512 variant [PULL,30/35] util/bufferiszero: Reorganize for early test for acceleration [PULL,31/35] util/bufferiszero: Remove useless prefetches [PULL,32/35] util/bufferiszero: Optimize SSE2 and AVX2 variants [PULL,33/35] util/bufferiszero: Improve scalar variant [PULL,34/35] util/bufferiszero: Introduce biz_accel_fn typedef [PULL,35/35] util/bufferiszero: Simplify test_buffer_is_zero_next_accel

Message ID

20240408174929.862917-29-richard.henderson@linaro.org

State

Superseded

Headers

Received-SPF: pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as
 permitted sender) client-ip=209.51.188.17;
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Cc: Alexander Monakov <amonakov@ispras.ru>,
 Mikhail Romanov <mmromanov@ispras.ru>
Subject: [PULL 28/35] util/bufferiszero: Remove SSE4.1 variant
Date: Mon,  8 Apr 2024 07:49:22 -1000
Message-Id: <20240408174929.862917-29-richard.henderson@linaro.org>
In-Reply-To: <20240408174929.862917-1-richard.henderson@linaro.org>
References: <20240408174929.862917-1-richard.henderson@linaro.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=2607:f8b0:4864:20::52b;
 envelope-from=richard.henderson@linaro.org; helo=mail-pg1-x52b.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org
Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org

Series

[PULL,01/35] tcg/optimize: Do not attempt to constant fold neg_vec | expand

Commit Message

Richard Henderson April 8, 2024, 5:49 p.m. UTC

From: Alexander Monakov <amonakov@ispras.ru>

The SSE4.1 variant is virtually identical to the SSE2 variant, except
for using 'PTEST+JNZ' in place of 'PCMPEQB+PMOVMSKB+CMP+JNE' for testing
if an SSE register is all zeroes. The PTEST instruction decodes to two
uops, so it can be handled only by the complex decoder, and since
CMP+JNE are macro-fused, both sequences decode to three uops. The uops
comprising the PTEST instruction dispatch to p0 and p5 on Intel CPUs, so
PCMPEQB+PMOVMSKB is comparatively more flexible from dispatch
standpoint.

Hence, the use of PTEST brings no benefit from throughput standpoint.
Its latency is not important, since it feeds only a conditional jump,
which terminates the dependency chain.

I never observed PTEST variants to be faster on real hardware.

Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Signed-off-by: Mikhail Romanov <mmromanov@ispras.ru>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240206204809.9859-2-amonakov@ispras.ru>
---
 util/bufferiszero.c | 29 -----------------------------
 1 file changed, 29 deletions(-)

diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index 3e6a5dfd63..f5a3634f9a 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -100,34 +100,6 @@  buffer_zero_sse2(const void *buf, size_t len)
 }
 
 #ifdef CONFIG_AVX2_OPT
-static bool __attribute__((target("sse4")))
-buffer_zero_sse4(const void *buf, size_t len)
-{
-    __m128i t = _mm_loadu_si128(buf);
-    __m128i *p = (__m128i *)(((uintptr_t)buf + 5 * 16) & -16);
-    __m128i *e = (__m128i *)(((uintptr_t)buf + len) & -16);
-
-    /* Loop over 16-byte aligned blocks of 64.  */
-    while (likely(p <= e)) {
-        __builtin_prefetch(p);
-        if (unlikely(!_mm_testz_si128(t, t))) {
-            return false;
-        }
-        t = p[-4] | p[-3] | p[-2] | p[-1];
-        p += 4;
-    }
-
-    /* Finish the aligned tail.  */
-    t |= e[-3];
-    t |= e[-2];
-    t |= e[-1];
-
-    /* Finish the unaligned tail.  */
-    t |= _mm_loadu_si128(buf + len - 16);
-
-    return _mm_testz_si128(t, t);
-}
-
 static bool __attribute__((target("avx2")))
 buffer_zero_avx2(const void *buf, size_t len)
 {
@@ -221,7 +193,6 @@  select_accel_cpuinfo(unsigned info)
 #endif
 #ifdef CONFIG_AVX2_OPT
         { CPUINFO_AVX2,    128, buffer_zero_avx2 },
-        { CPUINFO_SSE4,     64, buffer_zero_sse4 },
 #endif
         { CPUINFO_SSE2,     64, buffer_zero_sse2 },
         { CPUINFO_ALWAYS,    0, buffer_zero_int },

[PULL,28/35] util/bufferiszero: Remove SSE4.1 variant

Commit Message

Patch