From patchwork Mon Feb 10 17:45:35 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 863936 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A80862566D1; Mon, 10 Feb 2025 17:46:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739209561; cv=none; b=FXJ6Ih/kZnfFK45J6/bAQNBrvHWMwN7280KrtPOho21zJaeGhKIMrSQft0In/eKl03uJ9s0X7pbiMoKPG8Oqc4e3WYGNsqY0S3afo8gWZ5Aapt9hwNXEQXVxXpaDfeUnwsor62HF4iZbFlXtuTkLP3VAkStnviDpLH5rqAFO2tI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739209561; c=relaxed/simple; bh=z3VlGNh9Ebdd5u3D6w+au8gk9FZ2kN5XGr3s//44UMU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JEZW/UcTOZ9fuQZLX1cS4O/sK7krL/eH7Md5uWRZs9PE+C+wCp3EO3Yxt1qBoM94VoCZjFabpErp1LdOp1Az8Y+5SwokIDxZpeYqFE3QLWgv8p2fA6TyrrfLszAu8JWKAsdF6xlyN0rtaGnWTTiwswhsnVPsyr9QJHsE5KTEaSw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=GW0jRFly; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="GW0jRFly" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BF98FC4CEE6; Mon, 10 Feb 2025 17:46:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1739209561; bh=z3VlGNh9Ebdd5u3D6w+au8gk9FZ2kN5XGr3s//44UMU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=GW0jRFly3izf6lWvztj9ZZjOQCTCztOusGsG+jnQbS0q3F4Gou3gKN5TrlC9IC8xZ IhDt6KLnldJiIu644ADRmjTXCGPCxkCt2AoDoZVyBSb2MhHcgCeuKz0q/fB/qUuf6d wgN6XfQ44AnyAUOugiLF+dxEesw13H+CFhLDyBwzCKQlLbYH25Se37PdptrYvl37sk Fil5Xi2I4WZZIeYf1RfDtDUDSWzW+AShGA+0Wdf6aTGhnyCyiCGudxZqKaz5u0K22q 4Aj/61jvOGcMcPOidql9TGp1VtaG2QFFPEugSzYG+MuiDUzERI6n7iubf6sfI/HgsC gqYW5I5y3rKlg== From: Eric Biggers To: linux-kernel@vger.kernel.org Cc: linux-crypto@vger.kernel.org, x86@kernel.org, linux-block@vger.kernel.org, Ard Biesheuvel , Keith Busch , Kent Overstreet , "Martin K . Petersen" , Ingo Molnar Subject: [PATCH v4 1/6] x86: move ZMM exclusion list into CPU feature flag Date: Mon, 10 Feb 2025 09:45:35 -0800 Message-ID: <20250210174540.161705-2-ebiggers@kernel.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210174540.161705-1-ebiggers@kernel.org> References: <20250210174540.161705-1-ebiggers@kernel.org> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Eric Biggers Lift zmm_exclusion_list in aesni-intel_glue.c into the x86 CPU setup code, and add a new x86 CPU feature flag X86_FEATURE_PREFER_YMM that is set when the CPU is on this list. This allows other code in arch/x86/, such as the CRC library code, to apply the same exclusion list when deciding whether to execute 256-bit or 512-bit optimized functions. Note that full AVX512 support including ZMM registers is still exposed to userspace and is still supported for in-kernel use. This flag just indicates whether in-kernel code should prefer to use YMM registers. Acked-by: Ard Biesheuvel Acked-by: Ingo Molnar Acked-by: Keith Busch Reviewed-by: "Martin K. Petersen" Signed-off-by: Eric Biggers --- arch/x86/crypto/aesni-intel_glue.c | 22 +--------------------- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/kernel/cpu/intel.c | 22 ++++++++++++++++++++++ 3 files changed, 24 insertions(+), 21 deletions(-) diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c index 11e95fc62636e..3e9ab5cdade47 100644 --- a/arch/x86/crypto/aesni-intel_glue.c +++ b/arch/x86/crypto/aesni-intel_glue.c @@ -1534,30 +1534,10 @@ DEFINE_GCM_ALGS(vaes_avx10_256, FLAG_AVX10_256, DEFINE_GCM_ALGS(vaes_avx10_512, FLAG_AVX10_512, "generic-gcm-vaes-avx10_512", "rfc4106-gcm-vaes-avx10_512", AES_GCM_KEY_AVX10_SIZE, 800); #endif /* CONFIG_AS_VAES && CONFIG_AS_VPCLMULQDQ */ -/* - * This is a list of CPU models that are known to suffer from downclocking when - * zmm registers (512-bit vectors) are used. On these CPUs, the AES mode - * implementations with zmm registers won't be used by default. Implementations - * with ymm registers (256-bit vectors) will be used by default instead. - */ -static const struct x86_cpu_id zmm_exclusion_list[] = { - X86_MATCH_VFM(INTEL_SKYLAKE_X, 0), - X86_MATCH_VFM(INTEL_ICELAKE_X, 0), - X86_MATCH_VFM(INTEL_ICELAKE_D, 0), - X86_MATCH_VFM(INTEL_ICELAKE, 0), - X86_MATCH_VFM(INTEL_ICELAKE_L, 0), - X86_MATCH_VFM(INTEL_ICELAKE_NNPI, 0), - X86_MATCH_VFM(INTEL_TIGERLAKE_L, 0), - X86_MATCH_VFM(INTEL_TIGERLAKE, 0), - /* Allow Rocket Lake and later, and Sapphire Rapids and later. */ - /* Also allow AMD CPUs (starting with Zen 4, the first with AVX-512). */ - {}, -}; - static int __init register_avx_algs(void) { int err; if (!boot_cpu_has(X86_FEATURE_AVX)) @@ -1598,11 +1578,11 @@ static int __init register_avx_algs(void) ARRAY_SIZE(aes_gcm_algs_vaes_avx10_256), aes_gcm_simdalgs_vaes_avx10_256); if (err) return err; - if (x86_match_cpu(zmm_exclusion_list)) { + if (boot_cpu_has(X86_FEATURE_PREFER_YMM)) { int i; aes_xts_alg_vaes_avx10_512.base.cra_priority = 1; for (i = 0; i < ARRAY_SIZE(aes_gcm_algs_vaes_avx10_512); i++) aes_gcm_algs_vaes_avx10_512[i].base.cra_priority = 1; diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 508c0dad116bc..99334026a26c9 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -481,10 +481,11 @@ #define X86_FEATURE_CLEAR_BHB_HW (21*32+ 3) /* BHI_DIS_S HW control enabled */ #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* Clear branch history at vmexit using SW loop */ #define X86_FEATURE_AMD_FAST_CPPC (21*32 + 5) /* Fast CPPC */ #define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */ #define X86_FEATURE_AMD_WORKLOAD_CLASS (21*32 + 7) /* Workload Classification */ +#define X86_FEATURE_PREFER_YMM (21*32 + 8) /* Avoid ZMM registers due to downclocking */ /* * BUG word(s) */ #define X86_BUG(x) (NCAPINTS*32 + (x)) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 3dce22f00dc34..c3005c4ec46a9 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -519,10 +519,29 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c) msr = this_cpu_read(msr_misc_features_shadow); wrmsrl(MSR_MISC_FEATURES_ENABLES, msr); } +/* + * This is a list of Intel CPUs that are known to suffer from downclocking when + * ZMM registers (512-bit vectors) are used. On these CPUs, when the kernel + * executes SIMD-optimized code such as cryptography functions or CRCs, it + * should prefer 256-bit (YMM) code to 512-bit (ZMM) code. + */ +static const struct x86_cpu_id zmm_exclusion_list[] = { + X86_MATCH_VFM(INTEL_SKYLAKE_X, 0), + X86_MATCH_VFM(INTEL_ICELAKE_X, 0), + X86_MATCH_VFM(INTEL_ICELAKE_D, 0), + X86_MATCH_VFM(INTEL_ICELAKE, 0), + X86_MATCH_VFM(INTEL_ICELAKE_L, 0), + X86_MATCH_VFM(INTEL_ICELAKE_NNPI, 0), + X86_MATCH_VFM(INTEL_TIGERLAKE_L, 0), + X86_MATCH_VFM(INTEL_TIGERLAKE, 0), + /* Allow Rocket Lake and later, and Sapphire Rapids and later. */ + {}, +}; + static void init_intel(struct cpuinfo_x86 *c) { early_init_intel(c); intel_workarounds(c); @@ -599,10 +618,13 @@ static void init_intel(struct cpuinfo_x86 *c) if (p) strcpy(c->x86_model_id, p); } #endif + if (x86_match_cpu(zmm_exclusion_list)) + set_cpu_cap(c, X86_FEATURE_PREFER_YMM); + /* Work around errata */ srat_detect_node(c); init_ia32_feat_ctl(c); From patchwork Mon Feb 10 17:45:38 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 863935 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B9E96257456; Mon, 10 Feb 2025 17:46:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739209562; cv=none; b=FeBq0HWR/afBNejmYsVE94r26ZDXKRJ/UBaIlCVYoM+9Y97OHdpdbfV2YKj08SUpvAj1MIhcKb3eu/siNt05AkXpd5Zid7x8z1zv37FVUW/iRXgGdYVj1vu4bDh2WHQhcoolgpwDAMVU+q9KkRUOUtNGX3FKeeqJvRQluFYoYHQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739209562; c=relaxed/simple; bh=F4V82h98SHCOKzxf3nBrpwV7MPYF4gyLV1ygLyE8b+w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IY0IMapNm2taHVr/Haiu2GUqe/B7Biou4JY42fwFIcq5MgIaXQ7ZMg+TxLv/kolIitdlnu1g+P6FKSnHg+mwx3MIrszOhc/0oFyA0CL8gw70GqDBnmd83m6euI+4Ta776ysXh6Sd9n7L1MyUK+jOWMclnJVYBKR0bsaEq7w7Xcc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=R5UCLQ+p; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="R5UCLQ+p" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 05032C4CEE7; Mon, 10 Feb 2025 17:46:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1739209562; bh=F4V82h98SHCOKzxf3nBrpwV7MPYF4gyLV1ygLyE8b+w=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=R5UCLQ+puGZaD6dtGGZf6eD0vpXXfDYk3kc9equrs5pMkBGT4MVAMmtpXpo/VYrys lQ6hdCYq2/peAC3NaQR65v6OyKSE7LiTn/QKZpvpWuZ0dQVC8UXZ37ZFSvWhy/GiKY Jr6/8hsEBwhdoxYYN46dDeTDTSrezN8rxd5gNnb8+PPxawDMN/v0+obfLFBi4Y/3Z8 SokwXWPBTMTQwFbW/WY1umnIgvR2Hpy/NeV9fBlZLnCYS2+C74pUy5tp8ENvhPX+KW Pq2BHxIK3iEh+7V5ttsUepFcvq6PyOfNngHhaYR7vrd9XfAbIW/MOkosxcvj2aEcG3 YcW9zL0p6iHKg== From: Eric Biggers To: linux-kernel@vger.kernel.org Cc: linux-crypto@vger.kernel.org, x86@kernel.org, linux-block@vger.kernel.org, Ard Biesheuvel , Keith Busch , Kent Overstreet , "Martin K . Petersen" Subject: [PATCH v4 4/6] x86/crc32: implement crc32_le using new template Date: Mon, 10 Feb 2025 09:45:38 -0800 Message-ID: <20250210174540.161705-5-ebiggers@kernel.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210174540.161705-1-ebiggers@kernel.org> References: <20250210174540.161705-1-ebiggers@kernel.org> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Eric Biggers Instantiate crc-pclmul-template.S for crc32_le, and delete the original PCLMULQDQ optimized implementation. This has the following advantages: - Less CRC-variant-specific code. - VPCLMULQDQ support, greatly improving performance on sufficiently long messages on newer CPUs. - A faster reduction from 128 bits to the final CRC. - Support for lengths not a multiple of 16 bytes, improving performance for such lengths. - Support for misaligned buffers, improving performance in such cases. Benchmark results on AMD Ryzen 9 9950X (Zen 5) using crc_kunit: Length Before After ------ ------ ----- 1 427 MB/s 605 MB/s 16 710 MB/s 3631 MB/s 64 704 MB/s 7615 MB/s 127 3610 MB/s 9710 MB/s 128 8759 MB/s 12702 MB/s 200 7083 MB/s 15343 MB/s 256 17284 MB/s 22904 MB/s 511 10919 MB/s 27309 MB/s 512 19849 MB/s 48900 MB/s 1024 21216 MB/s 62630 MB/s 3173 22150 MB/s 72437 MB/s 4096 22496 MB/s 79593 MB/s 16384 22018 MB/s 85106 MB/s Acked-by: Ard Biesheuvel Acked-by: Keith Busch Reviewed-by: "Martin K. Petersen" Signed-off-by: Eric Biggers --- arch/x86/lib/crc-pclmul-consts.h | 53 ++++++++ arch/x86/lib/crc32-glue.c | 37 ++---- arch/x86/lib/crc32-pclmul.S | 219 +------------------------------ 3 files changed, 65 insertions(+), 244 deletions(-) create mode 100644 arch/x86/lib/crc-pclmul-consts.h diff --git a/arch/x86/lib/crc-pclmul-consts.h b/arch/x86/lib/crc-pclmul-consts.h new file mode 100644 index 0000000000000..34fdcb0446b03 --- /dev/null +++ b/arch/x86/lib/crc-pclmul-consts.h @@ -0,0 +1,53 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * CRC constants generated by: + * + * ./scripts/gen-crc-consts.py x86_pclmul crc32_lsb_0xedb88320 + * + * Do not edit manually. + */ + +/* + * CRC folding constants generated for least-significant-bit-first CRC-32 using + * G(x) = x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10 + x^8 + x^7 + + * x^5 + x^4 + x^2 + x^1 + x^0 + */ +static const struct { + u64 fold_across_2048_bits_consts[2]; + u64 fold_across_1024_bits_consts[2]; + u64 fold_across_512_bits_consts[2]; + u64 fold_across_256_bits_consts[2]; + u64 fold_across_128_bits_consts[2]; + u8 shuf_table[48]; + u64 barrett_reduction_consts[2]; +} crc32_lsb_0xedb88320_consts ____cacheline_aligned __maybe_unused = { + .fold_across_2048_bits_consts = { + 0x00000000ce3371cb, /* HI64_TERMS: (x^2079 mod G) * x^32 */ + 0x00000000e95c1271, /* LO64_TERMS: (x^2015 mod G) * x^32 */ + }, + .fold_across_1024_bits_consts = { + 0x0000000033fff533, /* HI64_TERMS: (x^1055 mod G) * x^32 */ + 0x00000000910eeec1, /* LO64_TERMS: (x^991 mod G) * x^32 */ + }, + .fold_across_512_bits_consts = { + 0x000000008f352d95, /* HI64_TERMS: (x^543 mod G) * x^32 */ + 0x000000001d9513d7, /* LO64_TERMS: (x^479 mod G) * x^32 */ + }, + .fold_across_256_bits_consts = { + 0x00000000f1da05aa, /* HI64_TERMS: (x^287 mod G) * x^32 */ + 0x0000000081256527, /* LO64_TERMS: (x^223 mod G) * x^32 */ + }, + .fold_across_128_bits_consts = { + 0x00000000ae689191, /* HI64_TERMS: (x^159 mod G) * x^32 */ + 0x00000000ccaa009e, /* LO64_TERMS: (x^95 mod G) * x^32 */ + }, + .shuf_table = { + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, + }, + .barrett_reduction_consts = { + 0xb4e5b025f7011641, /* HI64_TERMS: floor(x^95 / G) */ + 0x00000001db710640, /* LO64_TERMS: (G - x^32) * x^31 */ + }, +}; diff --git a/arch/x86/lib/crc32-glue.c b/arch/x86/lib/crc32-glue.c index 131c305e9ea0d..9c3f9c1b7bb9d 100644 --- a/arch/x86/lib/crc32-glue.c +++ b/arch/x86/lib/crc32-glue.c @@ -5,47 +5,24 @@ * Copyright (C) 2008 Intel Corporation * Copyright 2012 Xyratex Technology Limited * Copyright 2024 Google LLC */ -#include -#include -#include #include -#include #include - -/* minimum size of buffer for crc32_pclmul_le_16 */ -#define CRC32_PCLMUL_MIN_LEN 64 +#include "crc-pclmul-template.h" static DEFINE_STATIC_KEY_FALSE(have_crc32); static DEFINE_STATIC_KEY_FALSE(have_pclmulqdq); -u32 crc32_pclmul_le_16(u32 crc, const u8 *buffer, size_t len); +DECLARE_CRC_PCLMUL_FUNCS(crc32_lsb, u32); u32 crc32_le_arch(u32 crc, const u8 *p, size_t len) { - if (len >= CRC32_PCLMUL_MIN_LEN + 15 && - static_branch_likely(&have_pclmulqdq) && crypto_simd_usable()) { - size_t n = -(uintptr_t)p & 15; - - /* align p to 16-byte boundary */ - if (n) { - crc = crc32_le_base(crc, p, n); - p += n; - len -= n; - } - n = round_down(len, 16); - kernel_fpu_begin(); - crc = crc32_pclmul_le_16(crc, p, n); - kernel_fpu_end(); - p += n; - len -= n; - } - if (len) - crc = crc32_le_base(crc, p, len); - return crc; + CRC_PCLMUL(crc, p, len, crc32_lsb, crc32_lsb_0xedb88320_consts, + have_pclmulqdq); + return crc32_le_base(crc, p, len); } EXPORT_SYMBOL(crc32_le_arch); #ifdef CONFIG_X86_64 #define CRC32_INST "crc32q %1, %q0" @@ -95,12 +72,14 @@ EXPORT_SYMBOL(crc32_be_arch); static int __init crc32_x86_init(void) { if (boot_cpu_has(X86_FEATURE_XMM4_2)) static_branch_enable(&have_crc32); - if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) + if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) { static_branch_enable(&have_pclmulqdq); + INIT_CRC_PCLMUL(crc32_lsb); + } return 0; } arch_initcall(crc32_x86_init); static void __exit crc32_x86_exit(void) diff --git a/arch/x86/lib/crc32-pclmul.S b/arch/x86/lib/crc32-pclmul.S index f9637789cac19..f20f40fb0172d 100644 --- a/arch/x86/lib/crc32-pclmul.S +++ b/arch/x86/lib/crc32-pclmul.S @@ -1,217 +1,6 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Copyright 2012 Xyratex Technology Limited - * - * Using hardware provided PCLMULQDQ instruction to accelerate the CRC32 - * calculation. - * CRC32 polynomial:0x04c11db7(BE)/0xEDB88320(LE) - * PCLMULQDQ is a new instruction in Intel SSE4.2, the reference can be found - * at: - * http://www.intel.com/products/processor/manuals/ - * Intel(R) 64 and IA-32 Architectures Software Developer's Manual - * Volume 2B: Instruction Set Reference, N-Z - * - * Authors: Gregory Prestas - * Alexander Boyko - */ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +// Copyright 2025 Google LLC -#include +#include "crc-pclmul-template.S" - -.section .rodata -.align 16 -/* - * [x4*128+32 mod P(x) << 32)]' << 1 = 0x154442bd4 - * #define CONSTANT_R1 0x154442bd4LL - * - * [(x4*128-32 mod P(x) << 32)]' << 1 = 0x1c6e41596 - * #define CONSTANT_R2 0x1c6e41596LL - */ -.Lconstant_R2R1: - .octa 0x00000001c6e415960000000154442bd4 -/* - * [(x128+32 mod P(x) << 32)]' << 1 = 0x1751997d0 - * #define CONSTANT_R3 0x1751997d0LL - * - * [(x128-32 mod P(x) << 32)]' << 1 = 0x0ccaa009e - * #define CONSTANT_R4 0x0ccaa009eLL - */ -.Lconstant_R4R3: - .octa 0x00000000ccaa009e00000001751997d0 -/* - * [(x64 mod P(x) << 32)]' << 1 = 0x163cd6124 - * #define CONSTANT_R5 0x163cd6124LL - */ -.Lconstant_R5: - .octa 0x00000000000000000000000163cd6124 -.Lconstant_mask32: - .octa 0x000000000000000000000000FFFFFFFF -/* - * #define CRCPOLY_TRUE_LE_FULL 0x1DB710641LL - * - * Barrett Reduction constant (u64`) = u` = (x**64 / P(x))` = 0x1F7011641LL - * #define CONSTANT_RU 0x1F7011641LL - */ -.Lconstant_RUpoly: - .octa 0x00000001F701164100000001DB710641 - -#define CONSTANT %xmm0 - -#ifdef __x86_64__ -#define CRC %edi -#define BUF %rsi -#define LEN %rdx -#else -#define CRC %eax -#define BUF %edx -#define LEN %ecx -#endif - - - -.text -/** - * Calculate crc32 - * CRC - initial crc32 - * BUF - buffer (16 bytes aligned) - * LEN - sizeof buffer (16 bytes aligned), LEN should be greater than 63 - * return %eax crc32 - * u32 crc32_pclmul_le_16(u32 crc, const u8 *buffer, size_t len); - */ - -SYM_FUNC_START(crc32_pclmul_le_16) /* buffer and buffer size are 16 bytes aligned */ - movdqa (BUF), %xmm1 - movdqa 0x10(BUF), %xmm2 - movdqa 0x20(BUF), %xmm3 - movdqa 0x30(BUF), %xmm4 - movd CRC, CONSTANT - pxor CONSTANT, %xmm1 - sub $0x40, LEN - add $0x40, BUF - cmp $0x40, LEN - jb .Lless_64 - -#ifdef __x86_64__ - movdqa .Lconstant_R2R1(%rip), CONSTANT -#else - movdqa .Lconstant_R2R1, CONSTANT -#endif - -.Lloop_64:/* 64 bytes Full cache line folding */ - prefetchnta 0x40(BUF) - movdqa %xmm1, %xmm5 - movdqa %xmm2, %xmm6 - movdqa %xmm3, %xmm7 -#ifdef __x86_64__ - movdqa %xmm4, %xmm8 -#endif - pclmulqdq $0x00, CONSTANT, %xmm1 - pclmulqdq $0x00, CONSTANT, %xmm2 - pclmulqdq $0x00, CONSTANT, %xmm3 -#ifdef __x86_64__ - pclmulqdq $0x00, CONSTANT, %xmm4 -#endif - pclmulqdq $0x11, CONSTANT, %xmm5 - pclmulqdq $0x11, CONSTANT, %xmm6 - pclmulqdq $0x11, CONSTANT, %xmm7 -#ifdef __x86_64__ - pclmulqdq $0x11, CONSTANT, %xmm8 -#endif - pxor %xmm5, %xmm1 - pxor %xmm6, %xmm2 - pxor %xmm7, %xmm3 -#ifdef __x86_64__ - pxor %xmm8, %xmm4 -#else - /* xmm8 unsupported for x32 */ - movdqa %xmm4, %xmm5 - pclmulqdq $0x00, CONSTANT, %xmm4 - pclmulqdq $0x11, CONSTANT, %xmm5 - pxor %xmm5, %xmm4 -#endif - - pxor (BUF), %xmm1 - pxor 0x10(BUF), %xmm2 - pxor 0x20(BUF), %xmm3 - pxor 0x30(BUF), %xmm4 - - sub $0x40, LEN - add $0x40, BUF - cmp $0x40, LEN - jge .Lloop_64 -.Lless_64:/* Folding cache line into 128bit */ -#ifdef __x86_64__ - movdqa .Lconstant_R4R3(%rip), CONSTANT -#else - movdqa .Lconstant_R4R3, CONSTANT -#endif - prefetchnta (BUF) - - movdqa %xmm1, %xmm5 - pclmulqdq $0x00, CONSTANT, %xmm1 - pclmulqdq $0x11, CONSTANT, %xmm5 - pxor %xmm5, %xmm1 - pxor %xmm2, %xmm1 - - movdqa %xmm1, %xmm5 - pclmulqdq $0x00, CONSTANT, %xmm1 - pclmulqdq $0x11, CONSTANT, %xmm5 - pxor %xmm5, %xmm1 - pxor %xmm3, %xmm1 - - movdqa %xmm1, %xmm5 - pclmulqdq $0x00, CONSTANT, %xmm1 - pclmulqdq $0x11, CONSTANT, %xmm5 - pxor %xmm5, %xmm1 - pxor %xmm4, %xmm1 - - cmp $0x10, LEN - jb .Lfold_64 -.Lloop_16:/* Folding rest buffer into 128bit */ - movdqa %xmm1, %xmm5 - pclmulqdq $0x00, CONSTANT, %xmm1 - pclmulqdq $0x11, CONSTANT, %xmm5 - pxor %xmm5, %xmm1 - pxor (BUF), %xmm1 - sub $0x10, LEN - add $0x10, BUF - cmp $0x10, LEN - jge .Lloop_16 - -.Lfold_64: - /* perform the last 64 bit fold, also adds 32 zeroes - * to the input stream */ - pclmulqdq $0x01, %xmm1, CONSTANT /* R4 * xmm1.low */ - psrldq $0x08, %xmm1 - pxor CONSTANT, %xmm1 - - /* final 32-bit fold */ - movdqa %xmm1, %xmm2 -#ifdef __x86_64__ - movdqa .Lconstant_R5(%rip), CONSTANT - movdqa .Lconstant_mask32(%rip), %xmm3 -#else - movdqa .Lconstant_R5, CONSTANT - movdqa .Lconstant_mask32, %xmm3 -#endif - psrldq $0x04, %xmm2 - pand %xmm3, %xmm1 - pclmulqdq $0x00, CONSTANT, %xmm1 - pxor %xmm2, %xmm1 - - /* Finish up with the bit-reversed barrett reduction 64 ==> 32 bits */ -#ifdef __x86_64__ - movdqa .Lconstant_RUpoly(%rip), CONSTANT -#else - movdqa .Lconstant_RUpoly, CONSTANT -#endif - movdqa %xmm1, %xmm2 - pand %xmm3, %xmm1 - pclmulqdq $0x10, CONSTANT, %xmm1 - pand %xmm3, %xmm1 - pclmulqdq $0x00, CONSTANT, %xmm1 - pxor %xmm2, %xmm1 - pextrd $0x01, %xmm1, %eax - - RET -SYM_FUNC_END(crc32_pclmul_le_16) +DEFINE_CRC_PCLMUL_FUNCS(crc32_lsb, /* bits= */ 32, /* lsb= */ 1) From patchwork Mon Feb 10 17:45:40 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Biggers X-Patchwork-Id: 863934 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 32E5A257AD8; Mon, 10 Feb 2025 17:46:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739209563; cv=none; b=IYJlnCQcxSHEG89eWnlimLpIFuQpIzmhJh/8anehlNC6ciGSAx2P5VrKy0mKWbZYS3zOEtpXcNXl7RbfiKJjdrqoKsZdvn+GwZ4JBh436xcCqp3jf2HOfNkzT4X0dlGYSKmZxp53f7Vx1DhPUogNbm2iRIKNCf3jx/tWCXEcI+0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739209563; c=relaxed/simple; bh=GiCuyfCp3pDwVSrj/ug9mnFeEyGHH5bBZJhukD5GYIE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=E3YTEVRwjqS2to7o4SN9vLpQc27leGEiUOLxG+y5uEHYLd+Absk3Ch4f5t0dY9MrjoqNfQwkw9/5oOVfpvZClb7+mnqOmU0dNF1ZKjne8BijSIPMw3L2GHm+Rp4QIooo1SqaPcvK1cm8tTIU1usAw4oO4gOOdm3CLMlsoXnxBmg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RX6r9yBQ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RX6r9yBQ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CB33EC4CEE9; Mon, 10 Feb 2025 17:46:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1739209563; bh=GiCuyfCp3pDwVSrj/ug9mnFeEyGHH5bBZJhukD5GYIE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=RX6r9yBQKQjDleQddSyN9TacAZCorM8ZKRvB4FrA0hq6Lp5QPcojEM4bJe7eWAC9A uWOTX37m7YTGGE92COSgYR598AorH9q6ExfYqzjYDgmuLayKAiyb/aw0X5NHfl50ZK 2QsuqR4tGdnjvfZscTpYcPFgj5bamE93ExMwftKBtC6CpoaJnhDMAp2YmsiNfOXB04 uqZDq+ZYAFBLfYUxVmcRRUCQBthE1+b1Ak5otKM9z+cI5UC4liYSRy0I72gXR0+pmw QiAMRrKj4r3gu+/5gqGSqnz4XPXhU8D0JDNH4ICg9IgH8KzW7E5NLe0N3OnMBxVeMa 6DVq4KGTyKgAw== From: Eric Biggers To: linux-kernel@vger.kernel.org Cc: linux-crypto@vger.kernel.org, x86@kernel.org, linux-block@vger.kernel.org, Ard Biesheuvel , Keith Busch , Kent Overstreet , "Martin K . Petersen" Subject: [PATCH v4 6/6] x86/crc64: implement crc64_be and crc64_nvme using new template Date: Mon, 10 Feb 2025 09:45:40 -0800 Message-ID: <20250210174540.161705-7-ebiggers@kernel.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210174540.161705-1-ebiggers@kernel.org> References: <20250210174540.161705-1-ebiggers@kernel.org> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Eric Biggers Add x86_64 [V]PCLMULQDQ optimized implementations of crc64_be() and crc64_nvme() by wiring them up to crc-pclmul-template.S. crc64_be() is used by bcache and bcachefs, and crc64_nvme() is used by blk-integrity. Both features can CRC large amounts of data, and the developers of both features have expressed interest in having these CRCs be optimized. So this optimization should be worthwhile. (See https://lore.kernel.org/r/v36sousjd5ukqlkpdxslvpu7l37zbu7d7slgc2trjjqwty2bny@qgzew34feo2r and https://lore.kernel.org/r/20220222163144.1782447-11-kbusch@kernel.org) Benchmark results on AMD Ryzen 9 9950X (Zen 5) using crc_kunit: crc64_be: Length Before After ------ ------ ----- 1 633 MB/s 477 MB/s 16 717 MB/s 2517 MB/s 64 715 MB/s 7525 MB/s 127 714 MB/s 10002 MB/s 128 713 MB/s 13344 MB/s 200 715 MB/s 15752 MB/s 256 714 MB/s 22933 MB/s 511 715 MB/s 28025 MB/s 512 714 MB/s 49772 MB/s 1024 715 MB/s 65261 MB/s 3173 714 MB/s 78773 MB/s 4096 714 MB/s 83315 MB/s 16384 714 MB/s 89487 MB/s crc64_nvme: Length Before After ------ ------ ----- 1 716 MB/s 474 MB/s 16 717 MB/s 3303 MB/s 64 713 MB/s 7940 MB/s 127 715 MB/s 9867 MB/s 128 714 MB/s 13698 MB/s 200 715 MB/s 15995 MB/s 256 714 MB/s 23479 MB/s 511 714 MB/s 28013 MB/s 512 715 MB/s 51533 MB/s 1024 715 MB/s 66788 MB/s 3173 715 MB/s 79182 MB/s 4096 715 MB/s 83966 MB/s 16384 715 MB/s 89739 MB/s Acked-by: Keith Busch Reviewed-by: "Martin K. Petersen" Signed-off-by: Eric Biggers --- arch/x86/Kconfig | 1 + arch/x86/lib/Makefile | 3 + arch/x86/lib/crc-pclmul-consts.h | 98 +++++++++++++++++++++++++++++++- arch/x86/lib/crc64-glue.c | 50 ++++++++++++++++ arch/x86/lib/crc64-pclmul.S | 7 +++ 5 files changed, 158 insertions(+), 1 deletion(-) create mode 100644 arch/x86/lib/crc64-glue.c create mode 100644 arch/x86/lib/crc64-pclmul.S diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7f59d73201ce7..aa7c9d57e4d37 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -75,10 +75,11 @@ config X86 select ARCH_HAS_CACHE_LINE_SIZE select ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION select ARCH_HAS_CPU_FINALIZE_INIT select ARCH_HAS_CPU_PASID if IOMMU_SVA select ARCH_HAS_CRC32 + select ARCH_HAS_CRC64 if X86_64 select ARCH_HAS_CRC_T10DIF select ARCH_HAS_CURRENT_STACK_POINTER select ARCH_HAS_DEBUG_VIRTUAL select ARCH_HAS_DEBUG_VM_PGTABLE if !X86_PAE select ARCH_HAS_DEVMEM_IS_ALLOWED diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile index 08496e221a7d1..71c14329fd799 100644 --- a/arch/x86/lib/Makefile +++ b/arch/x86/lib/Makefile @@ -40,10 +40,13 @@ lib-$(CONFIG_MITIGATION_RETPOLINE) += retpoline.o obj-$(CONFIG_CRC32_ARCH) += crc32-x86.o crc32-x86-y := crc32-glue.o crc32-pclmul.o crc32-x86-$(CONFIG_64BIT) += crc32c-3way.o +obj-$(CONFIG_CRC64_ARCH) += crc64-x86.o +crc64-x86-y := crc64-glue.o crc64-pclmul.o + obj-$(CONFIG_CRC_T10DIF_ARCH) += crc-t10dif-x86.o crc-t10dif-x86-y := crc-t10dif-glue.o crc16-msb-pclmul.o obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o obj-y += iomem.o diff --git a/arch/x86/lib/crc-pclmul-consts.h b/arch/x86/lib/crc-pclmul-consts.h index 089954988f977..fcc63c0643330 100644 --- a/arch/x86/lib/crc-pclmul-consts.h +++ b/arch/x86/lib/crc-pclmul-consts.h @@ -1,10 +1,10 @@ /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * CRC constants generated by: * - * ./scripts/gen-crc-consts.py x86_pclmul crc16_msb_0x8bb7,crc32_lsb_0xedb88320 + * ./scripts/gen-crc-consts.py x86_pclmul crc16_msb_0x8bb7,crc32_lsb_0xedb88320,crc64_msb_0x42f0e1eba9ea3693,crc64_lsb_0x9a6c9329ac4bc9b5 * * Do not edit manually. */ /* @@ -95,5 +95,101 @@ static const struct { .barrett_reduction_consts = { 0xb4e5b025f7011641, /* HI64_TERMS: floor(x^95 / G) */ 0x00000001db710640, /* LO64_TERMS: (G - x^32) * x^31 */ }, }; + +/* + * CRC folding constants generated for most-significant-bit-first CRC-64 using + * G(x) = x^64 + x^62 + x^57 + x^55 + x^54 + x^53 + x^52 + x^47 + x^46 + x^45 + + * x^40 + x^39 + x^38 + x^37 + x^35 + x^33 + x^32 + x^31 + x^29 + x^27 + + * x^24 + x^23 + x^22 + x^21 + x^19 + x^17 + x^13 + x^12 + x^10 + x^9 + + * x^7 + x^4 + x^1 + x^0 + */ +static const struct { + u8 bswap_mask[16]; + u64 fold_across_2048_bits_consts[2]; + u64 fold_across_1024_bits_consts[2]; + u64 fold_across_512_bits_consts[2]; + u64 fold_across_256_bits_consts[2]; + u64 fold_across_128_bits_consts[2]; + u8 shuf_table[48]; + u64 barrett_reduction_consts[2]; +} crc64_msb_0x42f0e1eba9ea3693_consts ____cacheline_aligned __maybe_unused = { + .bswap_mask = {15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0}, + .fold_across_2048_bits_consts = { + 0x7f52691a60ddc70d, /* LO64_TERMS: (x^2048 mod G) * x^0 */ + 0x7036b0389f6a0c82, /* HI64_TERMS: (x^2112 mod G) * x^0 */ + }, + .fold_across_1024_bits_consts = { + 0x05cf79dea9ac37d6, /* LO64_TERMS: (x^1024 mod G) * x^0 */ + 0x001067e571d7d5c2, /* HI64_TERMS: (x^1088 mod G) * x^0 */ + }, + .fold_across_512_bits_consts = { + 0x5f6843ca540df020, /* LO64_TERMS: (x^512 mod G) * x^0 */ + 0xddf4b6981205b83f, /* HI64_TERMS: (x^576 mod G) * x^0 */ + }, + .fold_across_256_bits_consts = { + 0x571bee0a227ef92b, /* LO64_TERMS: (x^256 mod G) * x^0 */ + 0x44bef2a201b5200c, /* HI64_TERMS: (x^320 mod G) * x^0 */ + }, + .fold_across_128_bits_consts = { + 0x05f5c3c7eb52fab6, /* LO64_TERMS: (x^128 mod G) * x^0 */ + 0x4eb938a7d257740e, /* HI64_TERMS: (x^192 mod G) * x^0 */ + }, + .shuf_table = { + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, + }, + .barrett_reduction_consts = { + 0x42f0e1eba9ea3693, /* LO64_TERMS: (G - x^64) * x^0 */ + 0x578d29d06cc4f872, /* HI64_TERMS: (floor(x^127 / G) * x) - x^64 */ + }, +}; + +/* + * CRC folding constants generated for least-significant-bit-first CRC-64 using + * G(x) = x^64 + x^63 + x^61 + x^59 + x^58 + x^56 + x^55 + x^52 + x^49 + x^48 + + * x^47 + x^46 + x^44 + x^41 + x^37 + x^36 + x^34 + x^32 + x^31 + x^28 + + * x^26 + x^23 + x^22 + x^19 + x^16 + x^13 + x^12 + x^10 + x^9 + x^6 + + * x^4 + x^3 + x^0 + */ +static const struct { + u64 fold_across_2048_bits_consts[2]; + u64 fold_across_1024_bits_consts[2]; + u64 fold_across_512_bits_consts[2]; + u64 fold_across_256_bits_consts[2]; + u64 fold_across_128_bits_consts[2]; + u8 shuf_table[48]; + u64 barrett_reduction_consts[2]; +} crc64_lsb_0x9a6c9329ac4bc9b5_consts ____cacheline_aligned __maybe_unused = { + .fold_across_2048_bits_consts = { + 0x37ccd3e14069cabc, /* HI64_TERMS: (x^2111 mod G) * x^0 */ + 0xa043808c0f782663, /* LO64_TERMS: (x^2047 mod G) * x^0 */ + }, + .fold_across_1024_bits_consts = { + 0xa1ca681e733f9c40, /* HI64_TERMS: (x^1087 mod G) * x^0 */ + 0x5f852fb61e8d92dc, /* LO64_TERMS: (x^1023 mod G) * x^0 */ + }, + .fold_across_512_bits_consts = { + 0x0c32cdb31e18a84a, /* HI64_TERMS: (x^575 mod G) * x^0 */ + 0x62242240ace5045a, /* LO64_TERMS: (x^511 mod G) * x^0 */ + }, + .fold_across_256_bits_consts = { + 0xb0bc2e589204f500, /* HI64_TERMS: (x^319 mod G) * x^0 */ + 0xe1e0bb9d45d7a44c, /* LO64_TERMS: (x^255 mod G) * x^0 */ + }, + .fold_across_128_bits_consts = { + 0xeadc41fd2ba3d420, /* HI64_TERMS: (x^191 mod G) * x^0 */ + 0x21e9761e252621ac, /* LO64_TERMS: (x^127 mod G) * x^0 */ + }, + .shuf_table = { + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, + }, + .barrett_reduction_consts = { + 0x27ecfa329aef9f77, /* HI64_TERMS: floor(x^127 / G) */ + 0x34d926535897936a, /* LO64_TERMS: (G - x^64 - x^0) / x */ + }, +}; diff --git a/arch/x86/lib/crc64-glue.c b/arch/x86/lib/crc64-glue.c new file mode 100644 index 0000000000000..b0e1b719ecbfb --- /dev/null +++ b/arch/x86/lib/crc64-glue.c @@ -0,0 +1,50 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * CRC64 using [V]PCLMULQDQ instructions + * + * Copyright 2025 Google LLC + */ + +#include +#include +#include "crc-pclmul-template.h" + +static DEFINE_STATIC_KEY_FALSE(have_pclmulqdq); + +DECLARE_CRC_PCLMUL_FUNCS(crc64_msb, u64); +DECLARE_CRC_PCLMUL_FUNCS(crc64_lsb, u64); + +u64 crc64_be_arch(u64 crc, const u8 *p, size_t len) +{ + CRC_PCLMUL(crc, p, len, crc64_msb, crc64_msb_0x42f0e1eba9ea3693_consts, + have_pclmulqdq); + return crc64_be_generic(crc, p, len); +} +EXPORT_SYMBOL_GPL(crc64_be_arch); + +u64 crc64_nvme_arch(u64 crc, const u8 *p, size_t len) +{ + CRC_PCLMUL(crc, p, len, crc64_lsb, crc64_lsb_0x9a6c9329ac4bc9b5_consts, + have_pclmulqdq); + return crc64_nvme_generic(crc, p, len); +} +EXPORT_SYMBOL_GPL(crc64_nvme_arch); + +static int __init crc64_x86_init(void) +{ + if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) { + static_branch_enable(&have_pclmulqdq); + INIT_CRC_PCLMUL(crc64_msb); + INIT_CRC_PCLMUL(crc64_lsb); + } + return 0; +} +arch_initcall(crc64_x86_init); + +static void __exit crc64_x86_exit(void) +{ +} +module_exit(crc64_x86_exit); + +MODULE_DESCRIPTION("CRC64 using [V]PCLMULQDQ instructions"); +MODULE_LICENSE("GPL"); diff --git a/arch/x86/lib/crc64-pclmul.S b/arch/x86/lib/crc64-pclmul.S new file mode 100644 index 0000000000000..4173051b5197c --- /dev/null +++ b/arch/x86/lib/crc64-pclmul.S @@ -0,0 +1,7 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +// Copyright 2025 Google LLC + +#include "crc-pclmul-template.S" + +DEFINE_CRC_PCLMUL_FUNCS(crc64_msb, /* bits= */ 64, /* lsb= */ 0) +DEFINE_CRC_PCLMUL_FUNCS(crc64_lsb, /* bits= */ 64, /* lsb= */ 1)