From patchwork Fri Feb 28 10:00:12 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Sridhar, Kanchana P" X-Patchwork-Id: 869384 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 139EC25BAB9; Fri, 28 Feb 2025 10:00:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740736831; cv=none; b=oVzn1PJebdsGMWDJkaKGC0JQRv6kudCFBQedoA3rLdKzB3hVWu2Idn6njR85+QUVmQmdbLIQQEkCYFqWVvVVCfz2Tnv6f8uY+xM2DF3/jGwEVcdx/YWbRSDQQYZeaz/UAAU534101puK2pUeQDX+RkcRdppq98W0CRDxqz0EuVg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740736831; c=relaxed/simple; bh=8AO2hJSnLaCRd/8HtMcdHOYwB/zSpBOSifH5FW9w0Ms=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=c6R6jtl3JkI+dMFItRWl6HanI+G/StmklC2e3lraFATjoKmeJEbqC1U7mddpet9IR6K790jpoyrLVmnNTZfa/oNP1/OtY6bie5kV2AzQe1AmR0OhSZBHUR8p1DnRvwlMAGTf9/wrJpurLU/rM3Sj5s214Q/rhN1EhdqTbFSg7e4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=iha7GWEP; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="iha7GWEP" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740736829; x=1772272829; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8AO2hJSnLaCRd/8HtMcdHOYwB/zSpBOSifH5FW9w0Ms=; b=iha7GWEPuGjBRnaJSSWT3MSnuImMMVUjyny+DP9qFU7e6TIDW3xEv/hn BTeGtXpl1QIvLdPRWT3CdxTSaPYdUbJuC5AteaGjMGf02cPRhgb6bg7LC maLmslw826zLHZ9AvWtP/V1YZreu2Zaue1TRTrvHS0fsov7j9jR0+dztS TVsTB1crMmRc6gXlnmKqSm1tQHVsMtQtR0R8NzSR5PfabJ9yEJXLkxikC tAudo/rXzUQEb3T2oekHidtAhUYihkeGaIG/OodHp8y1MHpkyHei6Ain+ h7VllME3NiEAid8Gk+7q5dWJhmVAYynzB5tuburw+6b5NGewT+UxwViRq w==; X-CSE-ConnectionGUID: 08UGPaDoR9S0IWUVUoynvQ== X-CSE-MsgGUID: rsEIis6JSDKNDxIuDjYf2w== X-IronPort-AV: E=McAfee;i="6700,10204,11358"; a="40902588" X-IronPort-AV: E=Sophos;i="6.13,322,1732608000"; d="scan'208";a="40902588" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Feb 2025 02:00:26 -0800 X-CSE-ConnectionGUID: +uREVyviRsCit8ekE1F3AQ== X-CSE-MsgGUID: PZ6fGJj+QYaoQWJo3S6Pwg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,322,1732608000"; d="scan'208";a="117325698" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by orviesa006.jf.intel.com with ESMTP; 28 Feb 2025 02:00:26 -0800 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, ying.huang@linux.alibaba.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v7 03/15] crypto: iaa - Add an acomp_req flag CRYPTO_ACOMP_REQ_POLL to enable async mode. Date: Fri, 28 Feb 2025 02:00:12 -0800 Message-Id: <20250228100024.332528-4-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20250228100024.332528-1-kanchana.p.sridhar@intel.com> References: <20250228100024.332528-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 If the iaa_crypto driver has async_mode set to true, and use_irq set to false, it can still be forced to use synchronous mode by turning off the CRYPTO_ACOMP_REQ_POLL flag in req->flags. In other words, all three of the following need to be true for a request to be processed in fully async poll mode: 1) async_mode should be "true" 2) use_irq should be "false" 3) req->flags & CRYPTO_ACOMP_REQ_POLL should be "true" Suggested-by: Herbert Xu Signed-off-by: Kanchana P Sridhar --- drivers/crypto/intel/iaa/iaa_crypto_main.c | 11 ++++++++++- include/crypto/acompress.h | 5 +++++ 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/intel/iaa/iaa_crypto_main.c index c3776b0de51d..d7983ab3c34a 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto_main.c +++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c @@ -1520,6 +1520,10 @@ static int iaa_comp_acompress(struct acomp_req *req) return -EINVAL; } + /* If the caller has requested no polling, disable async. */ + if (!(req->flags & CRYPTO_ACOMP_REQ_POLL)) + disable_async = true; + cpu = get_cpu(); wq = wq_table_next_wq(cpu); put_cpu(); @@ -1712,6 +1716,7 @@ static int iaa_comp_adecompress(struct acomp_req *req) { struct crypto_tfm *tfm = req->base.tfm; dma_addr_t src_addr, dst_addr; + bool disable_async = false; int nr_sgs, cpu, ret = 0; struct iaa_wq *iaa_wq; struct device *dev; @@ -1727,6 +1732,10 @@ static int iaa_comp_adecompress(struct acomp_req *req) return -EINVAL; } + /* If the caller has requested no polling, disable async. */ + if (!(req->flags & CRYPTO_ACOMP_REQ_POLL)) + disable_async = true; + if (!req->dst) return iaa_comp_adecompress_alloc_dest(req); @@ -1775,7 +1784,7 @@ static int iaa_comp_adecompress(struct acomp_req *req) req->dst, req->dlen, sg_dma_len(req->dst)); ret = iaa_decompress(tfm, req, wq, src_addr, req->slen, - dst_addr, &req->dlen, false); + dst_addr, &req->dlen, disable_async); if (ret == -EINPROGRESS) return ret; diff --git a/include/crypto/acompress.h b/include/crypto/acompress.h index 147f184b6bea..afadf84f236d 100644 --- a/include/crypto/acompress.h +++ b/include/crypto/acompress.h @@ -14,6 +14,11 @@ #include #define CRYPTO_ACOMP_ALLOC_OUTPUT 0x00000001 +/* + * If set, the driver must have a way to submit the req, then + * poll its completion status for success/error. + */ +#define CRYPTO_ACOMP_REQ_POLL 0x00000002 #define CRYPTO_ACOMP_DST_MAX 131072 /** From patchwork Fri Feb 28 10:00:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Sridhar, Kanchana P" X-Patchwork-Id: 869383 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C292B25BAD6; Fri, 28 Feb 2025 10:00:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740736832; cv=none; b=oo0tB7ZdZO5Oj9bs71fWSQZ/Ro/E0Bwg8kQyWcBlcGowelyNgFhsU52rvcJ6+Vlj7qHQ8uZZQaFrEnPthzahLpg1CJkg8+tuMR3YnynHu1v2+uaiGqMeJYqIwF2+CzZtDpEDpJABOtKg6k/F/p8sCnNeDzRC7Tcd3aHgipznDCM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740736832; c=relaxed/simple; bh=yAxiyAJkmOi+qCLJ6YIryDvvFkW0bqE3LR6f2KKFmuI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=PZlasmz5PvywgVvFw3+0O9nZtEvc2wqcfATIWS9sdsx5pMyIon7Jna8PLLLXipsvJgAU9TT/BRJvOaAI4xgVu8OHnDA+yyAxK0zKY1n8F0caGjKMxkrja2WZH2XHc/suEkrOo0eBeUNGO8JqFQ9Co0E4y9i2iDLORI1GypmxaeE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=hPsX2fy4; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hPsX2fy4" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740736830; x=1772272830; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yAxiyAJkmOi+qCLJ6YIryDvvFkW0bqE3LR6f2KKFmuI=; b=hPsX2fy4H86xbNCpyWSdo3OpOltKyM3AbkizzYSSEY2StjlMkXlGUamo QWI1lvP/93IuByOY4KIxFleq8MC8TriRMOWJiZUh9Nz0Qu8PceoLddJAP e7sBHOPZMf8M4bHsRjGat2kw7312EMBBFAaANJwzYjduIDhqRYYPRDNHa YSulpHIOCR2Twm4lwa8w21ZQiP8+8k8pjSuW5ElQH3GBuxkFTibQZ52DF SrU/39+AGbJM3v7LaV2rdJHSq89iIEvs89XWU/FNVKLkAov2Up1XeX6Rb Zo2z8JKyzlUiiZiPOsH1AthI27ZRYJWRfdotC4yBp1tZtX8N3K1x3O05j Q==; X-CSE-ConnectionGUID: Gy3kI9mxTQCyLuJMDu8fHg== X-CSE-MsgGUID: oQoY9T55RNOTL1oSroxfEA== X-IronPort-AV: E=McAfee;i="6700,10204,11358"; a="40902602" X-IronPort-AV: E=Sophos;i="6.13,322,1732608000"; d="scan'208";a="40902602" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Feb 2025 02:00:26 -0800 X-CSE-ConnectionGUID: JsOxxOoQS1Wcf0vCVk5nKw== X-CSE-MsgGUID: PNSJc6UrQ0OcddnrQDWl/Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,322,1732608000"; d="scan'208";a="117325703" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by orviesa006.jf.intel.com with ESMTP; 28 Feb 2025 02:00:26 -0800 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, ying.huang@linux.alibaba.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v7 04/15] crypto: iaa - Implement batch compression/decompression with request chaining. Date: Fri, 28 Feb 2025 02:00:13 -0800 Message-Id: <20250228100024.332528-5-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20250228100024.332528-1-kanchana.p.sridhar@intel.com> References: <20250228100024.332528-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This patch provides the iaa_crypto driver implementation for the newly added crypto_acomp "get_batch_size()" interface which will be called when swap modules invoke crypto_acomp_batch_size() to query for the maximum batch size. This will return an iaa_driver specific constant, IAA_CRYPTO_MAX_BATCH_SIZE (set to 8U currently). This allows swap modules such as zswap/zram to allocate required batching resources and then invoke fully asynchronous batch parallel compression/decompression of pages on systems with Intel IAA, by setting up a request chain, and calling crypto_acomp_compress() or crypto_acomp_decompress() with the head request in the chain. This enables zswap compress batching code to be developed in a manner similar to the current single-page synchronous calls to crypto_acomp_compress() and crypto_acomp_decompress(), thereby, facilitating encapsulated and modular hand-off between the kernel zswap/zram code and the crypto_acomp layer. This patch also provides implementations of IAA batching with request chaining for both iaa_crypto sync modes: asynchronous/no-irq and fully synchronous. Since iaa_crypto supports the use of acomp request chaining, this patch also adds CRYPTO_ALG_REQ_CHAIN to the iaa_acomp_fixed_deflate algorithm's cra_flags. Suggested-by: Herbert Xu Signed-off-by: Kanchana P Sridhar --- drivers/crypto/intel/iaa/iaa_crypto.h | 9 + drivers/crypto/intel/iaa/iaa_crypto_main.c | 186 ++++++++++++++++++++- 2 files changed, 192 insertions(+), 3 deletions(-) diff --git a/drivers/crypto/intel/iaa/iaa_crypto.h b/drivers/crypto/intel/iaa/iaa_crypto.h index 56985e395263..45d94a646636 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto.h +++ b/drivers/crypto/intel/iaa/iaa_crypto.h @@ -39,6 +39,15 @@ IAA_DECOMP_CHECK_FOR_EOB | \ IAA_DECOMP_STOP_ON_EOB) +/* + * The maximum compress/decompress batch size for IAA's implementation of + * batched compressions/decompressions. + * The IAA compression algorithms should provide the crypto_acomp + * get_batch_size() interface through a function that returns this + * constant. + */ +#define IAA_CRYPTO_MAX_BATCH_SIZE 8U + /* Representation of IAA workqueue */ struct iaa_wq { struct list_head list; diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/intel/iaa/iaa_crypto_main.c index d7983ab3c34a..a9800b8f3575 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto_main.c +++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c @@ -1807,6 +1807,185 @@ static void compression_ctx_init(struct iaa_compression_ctx *ctx) ctx->use_irq = use_irq; } +static int iaa_comp_poll(struct acomp_req *req) +{ + struct idxd_desc *idxd_desc; + struct idxd_device *idxd; + struct iaa_wq *iaa_wq; + struct pci_dev *pdev; + struct device *dev; + struct idxd_wq *wq; + bool compress_op; + int ret; + + idxd_desc = req->base.data; + if (!idxd_desc) + return -EAGAIN; + + compress_op = (idxd_desc->iax_hw->opcode == IAX_OPCODE_COMPRESS); + wq = idxd_desc->wq; + iaa_wq = idxd_wq_get_private(wq); + idxd = iaa_wq->iaa_device->idxd; + pdev = idxd->pdev; + dev = &pdev->dev; + + ret = check_completion(dev, idxd_desc->iax_completion, true, true); + if (ret == -EAGAIN) + return ret; + if (ret) + goto out; + + req->dlen = idxd_desc->iax_completion->output_size; + + /* Update stats */ + if (compress_op) { + update_total_comp_bytes_out(req->dlen); + update_wq_comp_bytes(wq, req->dlen); + } else { + update_total_decomp_bytes_in(req->slen); + update_wq_decomp_bytes(wq, req->slen); + } + + if (iaa_verify_compress && (idxd_desc->iax_hw->opcode == IAX_OPCODE_COMPRESS)) { + struct crypto_tfm *tfm = req->base.tfm; + dma_addr_t src_addr, dst_addr; + u32 compression_crc; + + compression_crc = idxd_desc->iax_completion->crc; + + dma_sync_sg_for_device(dev, req->dst, 1, DMA_FROM_DEVICE); + dma_sync_sg_for_device(dev, req->src, 1, DMA_TO_DEVICE); + + src_addr = sg_dma_address(req->src); + dst_addr = sg_dma_address(req->dst); + + ret = iaa_compress_verify(tfm, req, wq, src_addr, req->slen, + dst_addr, &req->dlen, compression_crc); + } +out: + /* caller doesn't call crypto_wait_req, so no acomp_request_complete() */ + + dma_unmap_sg(dev, req->dst, sg_nents(req->dst), DMA_FROM_DEVICE); + dma_unmap_sg(dev, req->src, sg_nents(req->src), DMA_TO_DEVICE); + + idxd_free_desc(idxd_desc->wq, idxd_desc); + + dev_dbg(dev, "%s: returning ret=%d\n", __func__, ret); + + return ret; +} + +static unsigned int iaa_comp_get_batch_size(void) +{ + return IAA_CRYPTO_MAX_BATCH_SIZE; +} + +static void iaa_set_reqchain_poll( + struct acomp_req *req0, + bool set_flag) +{ + struct acomp_req *req; + + set_flag ? (req0->flags |= CRYPTO_ACOMP_REQ_POLL) : + (req0->flags &= ~CRYPTO_ACOMP_REQ_POLL); + + list_for_each_entry(req, &req0->base.list, base.list) + set_flag ? (req->flags |= CRYPTO_ACOMP_REQ_POLL) : + (req->flags &= ~CRYPTO_ACOMP_REQ_POLL); +} + +/** + * This API provides IAA compress batching functionality for use by swap + * modules. Batching is implemented using request chaining. + * + * @req: The head asynchronous compress request in the chain. + * + * Returns the compression error status (0 or -errno) of the last + * request that finishes. Caller should call acomp_request_err() + * for each request in the chain, to get its error status. + */ +static int iaa_comp_acompress_batch(struct acomp_req *req) +{ + bool async = (async_mode && !use_irq); + int err = 0; + + if (likely(async)) + iaa_set_reqchain_poll(req, true); + else + iaa_set_reqchain_poll(req, false); + + + if (likely(async)) + /* Process the request chain in parallel. */ + err = acomp_do_async_req_chain(req, iaa_comp_acompress, iaa_comp_poll); + else + /* Process the request chain in series. */ + err = acomp_do_req_chain(req, iaa_comp_acompress); + + /* + * For the same request chain to be usable by + * iaa_comp_acompress()/iaa_comp_adecompress() in synchronous mode, + * clear the CRYPTO_ACOMP_REQ_POLL bit on all acomp_reqs. + */ + iaa_set_reqchain_poll(req, false); + + return err; +} + +/** + * This API provides IAA decompress batching functionality for use by swap + * modules. Batching is implemented using request chaining. + * + * @req: The head asynchronous decompress request in the chain. + * + * Returns the decompression error status (0 or -errno) of the last + * request that finishes. Caller should call acomp_request_err() + * for each request in the chain, to get its error status. + */ +static int iaa_comp_adecompress_batch(struct acomp_req *req) +{ + bool async = (async_mode && !use_irq); + int err = 0; + + if (likely(async)) + iaa_set_reqchain_poll(req, true); + else + iaa_set_reqchain_poll(req, false); + + + if (likely(async)) + /* Process the request chain in parallel. */ + err = acomp_do_async_req_chain(req, iaa_comp_adecompress, iaa_comp_poll); + else + /* Process the request chain in series. */ + err = acomp_do_req_chain(req, iaa_comp_adecompress); + + /* + * For the same request chain to be usable by + * iaa_comp_acompress()/iaa_comp_adecompress() in synchronous mode, + * clear the CRYPTO_ACOMP_REQ_POLL bit on all acomp_reqs. + */ + iaa_set_reqchain_poll(req, false); + + return err; +} + +static int iaa_compress_main(struct acomp_req *req) +{ + if (acomp_is_reqchain(req)) + return iaa_comp_acompress_batch(req); + + return iaa_comp_acompress(req); +} + +static int iaa_decompress_main(struct acomp_req *req) +{ + if (acomp_is_reqchain(req)) + return iaa_comp_adecompress_batch(req); + + return iaa_comp_adecompress(req); +} + static int iaa_comp_init_fixed(struct crypto_acomp *acomp_tfm) { struct crypto_tfm *tfm = crypto_acomp_tfm(acomp_tfm); @@ -1829,13 +2008,14 @@ static void dst_free(struct scatterlist *sgl) static struct acomp_alg iaa_acomp_fixed_deflate = { .init = iaa_comp_init_fixed, - .compress = iaa_comp_acompress, - .decompress = iaa_comp_adecompress, + .compress = iaa_compress_main, + .decompress = iaa_decompress_main, .dst_free = dst_free, + .get_batch_size = iaa_comp_get_batch_size, .base = { .cra_name = "deflate", .cra_driver_name = "deflate-iaa", - .cra_flags = CRYPTO_ALG_ASYNC, + .cra_flags = CRYPTO_ALG_ASYNC | CRYPTO_ALG_REQ_CHAIN, .cra_ctxsize = sizeof(struct iaa_compression_ctx), .cra_module = THIS_MODULE, .cra_priority = IAA_ALG_PRIORITY, From patchwork Fri Feb 28 10:00:14 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Sridhar, Kanchana P" X-Patchwork-Id: 869382 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 49AB525C71A; Fri, 28 Feb 2025 10:00:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740736834; cv=none; b=q8U86MF41UdVqd6Q4PTKFrNj9yVB6vcOQrscQfDXaFQJXT5vJi5gHGPEj7Zx+5eY9B0a6IWRVSlZ6Ir1UTBEF6Ma3e5w8zv/hbbOYowAMv+xHzKyDvIRdncF6RnIMylLcb8hbf/vOwJqx2R4wKmKmP/d/kR37dK4iueWlB6cHto= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740736834; c=relaxed/simple; bh=VRzCnyQyEfhyAHvAPp33ZwaL6nzqmuQ9xsZtH/mrvjQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=nbhB96aBpc39biEmyCJAJKDt/36KZjscKPOigc/PhSdH7why70Z9IlsnQXyr9DEspANeiqTebItSJ4v4wAqMuVOqDfy3RKsU7E//GGLLgiDZmI52g4vkOUQliOI30gkdW/1DEqxJmcdMCyFwpudoYl31fRUeFAjezZip5rJF0YM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ExQrjg+p; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ExQrjg+p" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740736831; x=1772272831; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VRzCnyQyEfhyAHvAPp33ZwaL6nzqmuQ9xsZtH/mrvjQ=; b=ExQrjg+p0HaqVNUMiLRnFFR6/ufw7XadOrAPYBRezUJP+cA99iecrkrS 04TTsjgD3QS+cR4NnO6FQQg8NbgezuDHfhu4VHSVX/xcgNMDDTZ4LQAYV ybtopUdZ7yF2jH9FyyGVv0VbBT4O3fJTad2bl32yX1cKuuH9pGMuak/0a e5NGupxTG5OW3TsFnGfZ/lPDdDAMuavWeSaFxSZaamUztJyNdevycd3N9 fGnmgPY8eDIgTSkmuBJDDm8A9+21lXYAHTBkmbsu4zNvgFl1Ll7w/vlZf 8YQU0Kjb5FEFKgr8ia7Sk3m7Jc1cHDrZDt//2kBNg7TbdEMtV/tpdqE6y Q==; X-CSE-ConnectionGUID: gbgVYwBlRF2kxgzAIGTLSw== X-CSE-MsgGUID: ulnX63r7Q4+p/YHBT8F7qg== X-IronPort-AV: E=McAfee;i="6700,10204,11358"; a="40902615" X-IronPort-AV: E=Sophos;i="6.13,322,1732608000"; d="scan'208";a="40902615" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Feb 2025 02:00:26 -0800 X-CSE-ConnectionGUID: X0/nNyTWS0ykGFG2AYIKDA== X-CSE-MsgGUID: uQLjgsO6REGaSnplKdf3iw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,322,1732608000"; d="scan'208";a="117325713" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by orviesa006.jf.intel.com with ESMTP; 28 Feb 2025 02:00:26 -0800 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, ying.huang@linux.alibaba.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v7 05/15] crypto: iaa - Enable async mode and make it the default. Date: Fri, 28 Feb 2025 02:00:14 -0800 Message-Id: <20250228100024.332528-6-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20250228100024.332528-1-kanchana.p.sridhar@intel.com> References: <20250228100024.332528-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This patch enables the 'async' sync_mode in the driver. Further, it sets the default sync_mode to 'async', which makes it easier for IAA hardware acceleration in the iaa_crypto driver to be loaded by default in the most efficient/recommended 'async' mode for parallel compressions/decompressions, namely, asynchronous submission of descriptors, followed by polling for job completions with or without request chaining. Earlier, the "sync" mode used to be the default. This way, anyone who wants to use IAA for zswap/zram can do so after building the kernel, and without having to go through these steps to use async mode: 1) disable all the IAA device/wq bindings that happen at boot time 2) rmmod iaa_crypto 3) modprobe iaa_crypto 4) echo async > /sys/bus/dsa/drivers/crypto/sync_mode 5) re-run initialization of the IAA devices and wqs Signed-off-by: Kanchana P Sridhar --- Documentation/driver-api/crypto/iaa/iaa-crypto.rst | 11 ++--------- drivers/crypto/intel/iaa/iaa_crypto_main.c | 4 ++-- 2 files changed, 4 insertions(+), 11 deletions(-) diff --git a/Documentation/driver-api/crypto/iaa/iaa-crypto.rst b/Documentation/driver-api/crypto/iaa/iaa-crypto.rst index 8e50b900d51c..782da5230fcd 100644 --- a/Documentation/driver-api/crypto/iaa/iaa-crypto.rst +++ b/Documentation/driver-api/crypto/iaa/iaa-crypto.rst @@ -272,7 +272,7 @@ The available attributes are: echo async_irq > /sys/bus/dsa/drivers/crypto/sync_mode Async mode without interrupts (caller must poll) can be enabled by - writing 'async' to it (please see Caveat):: + writing 'async' to it:: echo async > /sys/bus/dsa/drivers/crypto/sync_mode @@ -281,14 +281,7 @@ The available attributes are: echo sync > /sys/bus/dsa/drivers/crypto/sync_mode - The default mode is 'sync'. - - Caveat: since the only mechanism that iaa_crypto currently implements - for async polling without interrupts is via the 'sync' mode as - described earlier, writing 'async' to - '/sys/bus/dsa/drivers/crypto/sync_mode' will internally enable the - 'sync' mode. This is to ensure correct iaa_crypto behavior until true - async polling without interrupts is enabled in iaa_crypto. + The default mode is 'async'. .. _iaa_default_config: diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/intel/iaa/iaa_crypto_main.c index a9800b8f3575..4dac4852c113 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto_main.c +++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c @@ -153,7 +153,7 @@ static DRIVER_ATTR_RW(verify_compress); */ /* Use async mode */ -static bool async_mode; +static bool async_mode = true; /* Use interrupts */ static bool use_irq; @@ -173,7 +173,7 @@ static int set_iaa_sync_mode(const char *name) async_mode = false; use_irq = false; } else if (sysfs_streq(name, "async")) { - async_mode = false; + async_mode = true; use_irq = false; } else if (sysfs_streq(name, "async_irq")) { async_mode = true; From patchwork Fri Feb 28 10:00:17 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Sridhar, Kanchana P" X-Patchwork-Id: 869381 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B909325D1FE; Fri, 28 Feb 2025 10:00:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740736836; cv=none; b=QlVrSOxlGz9A2F8JV+zFcfSSxDZ3zyh87SIZantsVaToEoZ7d8STPrpqyapO0zu7iShNgERK9ybeWln58u1gXkqOycikARflRUDCO49k6Zab4l8K/cBapaL+jKvPNgJ/JloYFv5xMmZRCWLSfoYjUd/gBcBs9Hbst7ofXInTtoc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740736836; c=relaxed/simple; bh=VUwt0gNvg+/EAq7f9Vl2avLoGXxhNLtj5vjvAXStlZ8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Y58YZY7h9WZVc6AWknDnBVFbDZwkIs3w24GTgkojqL60X7g/t5gr1NqJgMgQfRnAVcqElX30ADXtaBgkXRyzt/0iPP2IxQk21ashqw97jyYkFcoJfY8IR6xuUzKHadzP7BFKeqws5KTIw8aDgV8Ulnm1/6XH5LWohOmx9YGdt+4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=CLQDsU+M; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="CLQDsU+M" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740736834; x=1772272834; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VUwt0gNvg+/EAq7f9Vl2avLoGXxhNLtj5vjvAXStlZ8=; b=CLQDsU+MbEAFE35jtQMB4H74FnmfIfdH8LhRY2y+oZfCPkX3yqhUquQn 5TszvAyWXb6hXXOAVXXp8iedE2dv8C4Rbxp5ler1qiZV9kF4X2uikM+WS NPsNr3e8p6H9y1TASzO602aLxY3e2bL/jX4jtR4YaiUjYf5E1hlIBTMTc SIDyzP1MGaidZMh0HIGWKLcgD4zRVmoHuS9OHBHDiA29A3TCe09tVIoGn F7Tt/chWiZywWqqkiW2qaNL8253Zvcthm37yo4eosWNf0F4By1jtO2YiG hosi9bPtJbPbcvAJ22KWerjegkj73dIyrN0YtCsiTI5JQGCyWmCkY0nU0 g==; X-CSE-ConnectionGUID: OQj+s++3SZSocJgXLVZ6rw== X-CSE-MsgGUID: Dm56ME4XSmyAAY4W2UbzhA== X-IronPort-AV: E=McAfee;i="6700,10204,11358"; a="40902653" X-IronPort-AV: E=Sophos;i="6.13,322,1732608000"; d="scan'208";a="40902653" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Feb 2025 02:00:27 -0800 X-CSE-ConnectionGUID: s4+z0dMzREOAQBXWci/Acw== X-CSE-MsgGUID: 5FdoQSIxR9G/GCcLCUjYSg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,322,1732608000"; d="scan'208";a="117325729" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by orviesa006.jf.intel.com with ESMTP; 28 Feb 2025 02:00:27 -0800 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, ying.huang@linux.alibaba.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v7 08/15] crypto: iaa - Map IAA devices/wqs to cores based on packages instead of NUMA. Date: Fri, 28 Feb 2025 02:00:17 -0800 Message-Id: <20250228100024.332528-9-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20250228100024.332528-1-kanchana.p.sridhar@intel.com> References: <20250228100024.332528-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This patch modifies the algorithm for mapping available IAA devices and wqs to cores, as they are being discovered, based on packages instead of NUMA nodes. This leads to a more realistic mapping of IAA devices as compression/decompression resources for a package, rather than for a NUMA node. This also resolves problems that were observed during internal validation on Intel platforms with many more NUMA nodes than packages: for such cases, the earlier NUMA based allocation caused some IAAs to be over-subscribed and some to not be utilized at all. As a result of this change from NUMA to packages, some of the core functions used by the iaa_crypto driver's "probe" and "remove" API have been re-written. The new infrastructure maintains a static/global mapping of "local wqs" per IAA device, in the "struct iaa_device" itself. The earlier implementation would allocate memory per-cpu for this data, which never changes once the IAA devices/wqs have been initialized. Two main outcomes from this new iaa_crypto driver infrastructure are: 1) Resolves "task blocked for more than x seconds" errors observed during internal validation on Intel systems with the earlier NUMA node based mappings, which was root-caused to the non-optimal IAA-to-core mappings described earlier. 2) Results in a NUM_THREADS factor reduction in memory footprint cost of initializing IAA devices/wqs, due to eliminating the per-cpu copies of each IAA device's wqs. On a 384 cores Intel Granite Rapids server with 8 IAA devices, this saves 140MiB. Signed-off-by: Kanchana P Sridhar --- drivers/crypto/intel/iaa/iaa_crypto.h | 17 +- drivers/crypto/intel/iaa/iaa_crypto_main.c | 276 ++++++++++++--------- 2 files changed, 171 insertions(+), 122 deletions(-) diff --git a/drivers/crypto/intel/iaa/iaa_crypto.h b/drivers/crypto/intel/iaa/iaa_crypto.h index 45d94a646636..72ffdf55f7b3 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto.h +++ b/drivers/crypto/intel/iaa/iaa_crypto.h @@ -55,6 +55,7 @@ struct iaa_wq { struct idxd_wq *wq; int ref; bool remove; + bool mapped; struct iaa_device *iaa_device; @@ -72,6 +73,13 @@ struct iaa_device_compression_mode { dma_addr_t aecs_comp_table_dma_addr; }; +struct wq_table_entry { + struct idxd_wq **wqs; + int max_wqs; + int n_wqs; + int cur_wq; +}; + /* Representation of IAA device with wqs, populated by probe */ struct iaa_device { struct list_head list; @@ -82,19 +90,14 @@ struct iaa_device { int n_wq; struct list_head wqs; + struct wq_table_entry *iaa_local_wqs; + atomic64_t comp_calls; atomic64_t comp_bytes; atomic64_t decomp_calls; atomic64_t decomp_bytes; }; -struct wq_table_entry { - struct idxd_wq **wqs; - int max_wqs; - int n_wqs; - int cur_wq; -}; - #define IAA_AECS_ALIGN 32 /* diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/intel/iaa/iaa_crypto_main.c index abaee160e5ec..40751d7c83c0 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto_main.c +++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c @@ -30,8 +30,9 @@ /* number of iaa instances probed */ static unsigned int nr_iaa; static unsigned int nr_cpus; -static unsigned int nr_nodes; -static unsigned int nr_cpus_per_node; +static unsigned int nr_packages; +static unsigned int nr_cpus_per_package; +static unsigned int nr_iaa_per_package; /* Number of physical cpus sharing each iaa instance */ static unsigned int cpus_per_iaa; @@ -462,17 +463,46 @@ static void remove_device_compression_modes(struct iaa_device *iaa_device) * Functions for use in crypto probe and remove interfaces: * allocate/init/query/deallocate devices/wqs. ***********************************************************/ -static struct iaa_device *iaa_device_alloc(void) +static struct iaa_device *iaa_device_alloc(struct idxd_device *idxd) { + struct wq_table_entry *local; struct iaa_device *iaa_device; iaa_device = kzalloc(sizeof(*iaa_device), GFP_KERNEL); if (!iaa_device) - return NULL; + goto err; + + iaa_device->idxd = idxd; + + /* IAA device's local wqs. */ + iaa_device->iaa_local_wqs = kzalloc(sizeof(struct wq_table_entry), GFP_KERNEL); + if (!iaa_device->iaa_local_wqs) + goto err; + + local = iaa_device->iaa_local_wqs; + + local->wqs = kzalloc(iaa_device->idxd->max_wqs * sizeof(struct wq *), GFP_KERNEL); + if (!local->wqs) + goto err; + + local->max_wqs = iaa_device->idxd->max_wqs; + local->n_wqs = 0; INIT_LIST_HEAD(&iaa_device->wqs); return iaa_device; + +err: + if (iaa_device) { + if (iaa_device->iaa_local_wqs) { + if (iaa_device->iaa_local_wqs->wqs) + kfree(iaa_device->iaa_local_wqs->wqs); + kfree(iaa_device->iaa_local_wqs); + } + kfree(iaa_device); + } + + return NULL; } static bool iaa_has_wq(struct iaa_device *iaa_device, struct idxd_wq *wq) @@ -491,12 +521,10 @@ static struct iaa_device *add_iaa_device(struct idxd_device *idxd) { struct iaa_device *iaa_device; - iaa_device = iaa_device_alloc(); + iaa_device = iaa_device_alloc(idxd); if (!iaa_device) return NULL; - iaa_device->idxd = idxd; - list_add_tail(&iaa_device->list, &iaa_devices); nr_iaa++; @@ -537,6 +565,7 @@ static int add_iaa_wq(struct iaa_device *iaa_device, struct idxd_wq *wq, iaa_wq->wq = wq; iaa_wq->iaa_device = iaa_device; idxd_wq_set_private(wq, iaa_wq); + iaa_wq->mapped = false; list_add_tail(&iaa_wq->list, &iaa_device->wqs); @@ -580,6 +609,13 @@ static void free_iaa_device(struct iaa_device *iaa_device) return; remove_device_compression_modes(iaa_device); + + if (iaa_device->iaa_local_wqs) { + if (iaa_device->iaa_local_wqs->wqs) + kfree(iaa_device->iaa_local_wqs->wqs); + kfree(iaa_device->iaa_local_wqs); + } + kfree(iaa_device); } @@ -716,9 +752,14 @@ static int save_iaa_wq(struct idxd_wq *wq) if (WARN_ON(nr_iaa == 0)) return -EINVAL; - cpus_per_iaa = (nr_nodes * nr_cpus_per_node) / nr_iaa; + cpus_per_iaa = (nr_packages * nr_cpus_per_package) / nr_iaa; if (!cpus_per_iaa) cpus_per_iaa = 1; + + nr_iaa_per_package = nr_iaa / nr_packages; + if (!nr_iaa_per_package) + nr_iaa_per_package = 1; + out: return 0; } @@ -735,53 +776,45 @@ static void remove_iaa_wq(struct idxd_wq *wq) } if (nr_iaa) { - cpus_per_iaa = (nr_nodes * nr_cpus_per_node) / nr_iaa; + cpus_per_iaa = (nr_packages * nr_cpus_per_package) / nr_iaa; if (!cpus_per_iaa) cpus_per_iaa = 1; - } else + + nr_iaa_per_package = nr_iaa / nr_packages; + if (!nr_iaa_per_package) + nr_iaa_per_package = 1; + } else { cpus_per_iaa = 1; + nr_iaa_per_package = 1; + } } /*************************************************************** * Mapping IAA devices and wqs to cores with per-cpu wq_tables. ***************************************************************/ -static void wq_table_free_entry(int cpu) -{ - struct wq_table_entry *entry = per_cpu_ptr(wq_table, cpu); - - kfree(entry->wqs); - memset(entry, 0, sizeof(*entry)); -} - -static void wq_table_clear_entry(int cpu) -{ - struct wq_table_entry *entry = per_cpu_ptr(wq_table, cpu); - - entry->n_wqs = 0; - entry->cur_wq = 0; - memset(entry->wqs, 0, entry->max_wqs * sizeof(struct idxd_wq *)); -} - -static void clear_wq_table(void) +/* + * Given a cpu, find the closest IAA instance. The idea is to try to + * choose the most appropriate IAA instance for a caller and spread + * available workqueues around to clients. + */ +static inline int cpu_to_iaa(int cpu) { - int cpu; - - for (cpu = 0; cpu < nr_cpus; cpu++) - wq_table_clear_entry(cpu); + int package_id, base_iaa, iaa = 0; - pr_debug("cleared wq table\n"); -} + if (!nr_packages || !nr_iaa_per_package) + return 0; -static void free_wq_table(void) -{ - int cpu; + package_id = topology_logical_package_id(cpu); + base_iaa = package_id * nr_iaa_per_package; + iaa = base_iaa + ((cpu % nr_cpus_per_package) / cpus_per_iaa); - for (cpu = 0; cpu < nr_cpus; cpu++) - wq_table_free_entry(cpu); + pr_debug("cpu = %d, package_id = %d, base_iaa = %d, iaa = %d", + cpu, package_id, base_iaa, iaa); - free_percpu(wq_table); + if (iaa >= 0 && iaa < nr_iaa) + return iaa; - pr_debug("freed wq table\n"); + return (nr_iaa - 1); } static int alloc_wq_table(int max_wqs) @@ -795,13 +828,11 @@ static int alloc_wq_table(int max_wqs) for (cpu = 0; cpu < nr_cpus; cpu++) { entry = per_cpu_ptr(wq_table, cpu); - entry->wqs = kcalloc(max_wqs, sizeof(struct wq *), GFP_KERNEL); - if (!entry->wqs) { - free_wq_table(); - return -ENOMEM; - } + entry->wqs = NULL; entry->max_wqs = max_wqs; + entry->n_wqs = 0; + entry->cur_wq = 0; } pr_debug("initialized wq table\n"); @@ -809,33 +840,27 @@ static int alloc_wq_table(int max_wqs) return 0; } -static void wq_table_add(int cpu, struct idxd_wq *wq) +static void wq_table_add(int cpu, struct wq_table_entry *iaa_local_wqs) { struct wq_table_entry *entry = per_cpu_ptr(wq_table, cpu); - if (WARN_ON(entry->n_wqs == entry->max_wqs)) - return; - - entry->wqs[entry->n_wqs++] = wq; + entry->wqs = iaa_local_wqs->wqs; + entry->max_wqs = iaa_local_wqs->max_wqs; + entry->n_wqs = iaa_local_wqs->n_wqs; + entry->cur_wq = 0; - pr_debug("%s: added iaa wq %d.%d to idx %d of cpu %d\n", __func__, + pr_debug("%s: cpu %d: added %d iaa local wqs up to wq %d.%d\n", __func__, + cpu, entry->n_wqs, entry->wqs[entry->n_wqs - 1]->idxd->id, - entry->wqs[entry->n_wqs - 1]->id, entry->n_wqs - 1, cpu); + entry->wqs[entry->n_wqs - 1]->id); } static int wq_table_add_wqs(int iaa, int cpu) { struct iaa_device *iaa_device, *found_device = NULL; - int ret = 0, cur_iaa = 0, n_wqs_added = 0; - struct idxd_device *idxd; - struct iaa_wq *iaa_wq; - struct pci_dev *pdev; - struct device *dev; + int ret = 0, cur_iaa = 0; list_for_each_entry(iaa_device, &iaa_devices, list) { - idxd = iaa_device->idxd; - pdev = idxd->pdev; - dev = &pdev->dev; if (cur_iaa != iaa) { cur_iaa++; @@ -843,7 +868,8 @@ static int wq_table_add_wqs(int iaa, int cpu) } found_device = iaa_device; - dev_dbg(dev, "getting wq from iaa_device %d, cur_iaa %d\n", + dev_dbg(&found_device->idxd->pdev->dev, + "getting wq from iaa_device %d, cur_iaa %d\n", found_device->idxd->id, cur_iaa); break; } @@ -858,29 +884,58 @@ static int wq_table_add_wqs(int iaa, int cpu) } cur_iaa = 0; - idxd = found_device->idxd; - pdev = idxd->pdev; - dev = &pdev->dev; - dev_dbg(dev, "getting wq from only iaa_device %d, cur_iaa %d\n", + dev_dbg(&found_device->idxd->pdev->dev, + "getting wq from only iaa_device %d, cur_iaa %d\n", found_device->idxd->id, cur_iaa); } - list_for_each_entry(iaa_wq, &found_device->wqs, list) { - wq_table_add(cpu, iaa_wq->wq); - pr_debug("rebalance: added wq for cpu=%d: iaa wq %d.%d\n", - cpu, iaa_wq->wq->idxd->id, iaa_wq->wq->id); - n_wqs_added++; + wq_table_add(cpu, found_device->iaa_local_wqs); + +out: + return ret; +} + +static int map_iaa_device_wqs(struct iaa_device *iaa_device) +{ + struct wq_table_entry *local; + int ret = 0, n_wqs_added = 0; + struct iaa_wq *iaa_wq; + + local = iaa_device->iaa_local_wqs; + + list_for_each_entry(iaa_wq, &iaa_device->wqs, list) { + if (iaa_wq->mapped && ++n_wqs_added) + continue; + + pr_debug("iaa_device %px: processing wq %d.%d\n", iaa_device, iaa_device->idxd->id, iaa_wq->wq->id); + + if (WARN_ON(local->n_wqs == local->max_wqs)) + break; + + local->wqs[local->n_wqs++] = iaa_wq->wq; + pr_debug("iaa_device %px: added local wq %d.%d\n", iaa_device, iaa_device->idxd->id, iaa_wq->wq->id); + + iaa_wq->mapped = true; + ++n_wqs_added; } - if (!n_wqs_added) { - pr_debug("couldn't find any iaa wqs!\n"); + if (!n_wqs_added && !iaa_device->n_wq) { + pr_debug("iaa_device %d: couldn't find any iaa wqs!\n", iaa_device->idxd->id); ret = -EINVAL; - goto out; } -out: + return ret; } +static void map_iaa_devices(void) +{ + struct iaa_device *iaa_device; + + list_for_each_entry(iaa_device, &iaa_devices, list) { + BUG_ON(map_iaa_device_wqs(iaa_device)); + } +} + /* * Rebalance the wq table so that given a cpu, it's easy to find the * closest IAA instance. The idea is to try to choose the most @@ -889,48 +944,42 @@ static int wq_table_add_wqs(int iaa, int cpu) */ static void rebalance_wq_table(void) { - const struct cpumask *node_cpus; - int node, cpu, iaa = -1; + int cpu, iaa; if (nr_iaa == 0) return; - pr_debug("rebalance: nr_nodes=%d, nr_cpus %d, nr_iaa %d, cpus_per_iaa %d\n", - nr_nodes, nr_cpus, nr_iaa, cpus_per_iaa); + map_iaa_devices(); - clear_wq_table(); + pr_debug("rebalance: nr_packages=%d, nr_cpus %d, nr_iaa %d, cpus_per_iaa %d\n", + nr_packages, nr_cpus, nr_iaa, cpus_per_iaa); - if (nr_iaa == 1) { - for (cpu = 0; cpu < nr_cpus; cpu++) { - if (WARN_ON(wq_table_add_wqs(0, cpu))) { - pr_debug("could not add any wqs for iaa 0 to cpu %d!\n", cpu); - return; - } + for (cpu = 0; cpu < nr_cpus; cpu++) { + iaa = cpu_to_iaa(cpu); + pr_debug("rebalance: cpu=%d iaa=%d\n", cpu, iaa); + + if (WARN_ON(iaa == -1)) { + pr_debug("rebalance (cpu_to_iaa(%d)) failed!\n", cpu); + return; } - return; + if (WARN_ON(wq_table_add_wqs(iaa, cpu))) { + pr_debug("could not add any wqs for iaa %d to cpu %d!\n", iaa, cpu); + return; + } } - for_each_node_with_cpus(node) { - node_cpus = cpumask_of_node(node); - - for (cpu = 0; cpu < cpumask_weight(node_cpus); cpu++) { - int node_cpu = cpumask_nth(cpu, node_cpus); - - if (WARN_ON(node_cpu >= nr_cpu_ids)) { - pr_debug("node_cpu %d doesn't exist!\n", node_cpu); - return; - } - - if ((cpu % cpus_per_iaa) == 0) - iaa++; + pr_debug("Finished rebalance local wqs."); +} - if (WARN_ON(wq_table_add_wqs(iaa, node_cpu))) { - pr_debug("could not add any wqs for iaa %d to cpu %d!\n", iaa, cpu); - return; - } - } +static void free_wq_tables(void) +{ + if (wq_table) { + free_percpu(wq_table); + wq_table = NULL; } + + pr_debug("freed local wq table\n"); } /*************************************************************** @@ -2134,7 +2183,7 @@ static int iaa_crypto_probe(struct idxd_dev *idxd_dev) free_iaa_wq(idxd_wq_get_private(wq)); err_save: if (first_wq) - free_wq_table(); + free_wq_tables(); err_alloc: mutex_unlock(&iaa_devices_lock); idxd_drv_disable_wq(wq); @@ -2184,7 +2233,9 @@ static void iaa_crypto_remove(struct idxd_dev *idxd_dev) if (nr_iaa == 0) { iaa_crypto_enabled = false; - free_wq_table(); + free_wq_tables(); + BUG_ON(!list_empty(&iaa_devices)); + INIT_LIST_HEAD(&iaa_devices); module_put(THIS_MODULE); pr_info("iaa_crypto now DISABLED\n"); @@ -2210,16 +2261,11 @@ static struct idxd_device_driver iaa_crypto_driver = { static int __init iaa_crypto_init_module(void) { int ret = 0; - int node; + INIT_LIST_HEAD(&iaa_devices); nr_cpus = num_possible_cpus(); - for_each_node_with_cpus(node) - nr_nodes++; - if (!nr_nodes) { - pr_err("IAA couldn't find any nodes with cpus\n"); - return -ENODEV; - } - nr_cpus_per_node = nr_cpus / nr_nodes; + nr_cpus_per_package = topology_num_cores_per_package(); + nr_packages = topology_max_packages(); if (crypto_has_comp("deflate-generic", 0, 0)) deflate_generic_tfm = crypto_alloc_comp("deflate-generic", 0, 0); From patchwork Fri Feb 28 10:00:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Sridhar, Kanchana P" X-Patchwork-Id: 869379 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D244A25D553; Fri, 28 Feb 2025 10:00:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740736840; cv=none; b=Jng4TW7vaVFTRWl617RoHLqNDzd8CmMi3bWZkoANIkHHZhoRRaE+/zmNsB/Vl7uQQZx+C+w+KI2oFbzp05xtd7LOobQuyXxxLUsqRnSKifuRj0b3mUuebTRGDRy1XWKfB95nRsQdnWCFHr6soC5HrlKpTu+B+jOeJO/HwmFlWCw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740736840; c=relaxed/simple; bh=ulF50kW0uCIexjMCp05uYzE1nbxjcJTcM/dakNDW84E=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ZMMz4yalIDuu4EkKnPgF4WvbXN0mhbc/3Lcej/r5Izny9eCz1Jn5B0D6VUzP+lxBHrQyq0M/PEM0Usaxa2PnlzhVnTT5x7uNN+awJFE2BUd4elvR219EppgX7HR8Pj7DMtPqBVfG1kJstjUJtFyZ3hgZaTxlzhAcAtCO2nyOT5A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=DU1JXc7W; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="DU1JXc7W" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740736837; x=1772272837; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ulF50kW0uCIexjMCp05uYzE1nbxjcJTcM/dakNDW84E=; b=DU1JXc7WmfyJSVDGtiuOROLUYNJxIZC1sl6iY++vmUQ2H6kwM3Uif0uy QqD14rTMmo7r9sIpJecEMQOm22kE4U3FsUns1nvcYLiVsLOoxIE4LUPhF ECyO4dmSqGtdYaOeWtbrjyHz8basl0jEIx0j9433dZOWNwxGKpcTtGf1d TwGdyJMTIruk5BY3W+GX9BqCCdzTepjjyKY0UI2PEL770QECH5jUNUwoe KosHaZrvhHSExbaJnqTa5Dnfg8+DaEI146ZiYkU6xmdX+zCsALT+zSJAP O0/Wl1UcNy7ho2NPjnA5FlIIjNkHec6njTa44e/X2l90lsetQ0wGgyD1I Q==; X-CSE-ConnectionGUID: 7wpZYwVNT0CZS84ucYOw3w== X-CSE-MsgGUID: arxEPFdlQdWyAGPHypET4A== X-IronPort-AV: E=McAfee;i="6700,10204,11358"; a="40902666" X-IronPort-AV: E=Sophos;i="6.13,322,1732608000"; d="scan'208";a="40902666" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Feb 2025 02:00:28 -0800 X-CSE-ConnectionGUID: iY4QjX1tS7mewOmDeRdWAQ== X-CSE-MsgGUID: twmuHe+rRSq76hL+MnD1NA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,322,1732608000"; d="scan'208";a="117325732" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by orviesa006.jf.intel.com with ESMTP; 28 Feb 2025 02:00:27 -0800 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, ying.huang@linux.alibaba.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v7 09/15] crypto: iaa - Distribute compress jobs from all cores to all IAAs on a package. Date: Fri, 28 Feb 2025 02:00:18 -0800 Message-Id: <20250228100024.332528-10-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20250228100024.332528-1-kanchana.p.sridhar@intel.com> References: <20250228100024.332528-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This change enables processes running on any logical core on a package to use all the IAA devices enabled on that package for compress jobs. In other words, compressions originating from any process in a package will be distributed in round-robin manner to the available IAA devices on the same package. This is not the default behavior, and is recommended only for highly contended scenarios, when there is significant swapout/swapin activity. The commit log describes how to enable this feature through driver parameters, but the key thing to note is that this requires configuring 2 work-queues per IAA device (each with 64 entries), with 1 WQ used solely for decompress jobs, and the other WQ used solely for compress jobs. Hence the above recommendation. The main premise behind this change is to make sure that no compress engines on any IAA device are un-utilized/under-utilized/over-utilized. In other words, the compress engines on all IAA devices are considered a global resource for that package, thus maximizing compression throughput. This allows the use of all IAA devices present in a given package for (batched) compressions originating from zswap/zram, from all cores on this package. A new per-cpu "global_wq_table" implements this in the iaa_crypto driver. We can think of the global WQ per IAA as a WQ to which all cores on that package can submit compress jobs. To avail of this feature, the user must configure 2 WQs per IAA in order to enable distribution of compress jobs to multiple IAA devices. Each IAA will have 2 WQs: wq.0 (local WQ): Used for decompress jobs from cores mapped by the cpu_to_iaa() "even balancing of logical cores to IAA devices" algorithm. wq.1 (global WQ): Used for compress jobs from *all* logical cores on that package. The iaa_crypto driver will place all global WQs from all same-package IAA devices in the global_wq_table per cpu on that package. When the driver receives a compress job, it will lookup the "next" global WQ in the cpu's global_wq_table to submit the descriptor. The starting wq in the global_wq_table for each cpu is the global wq associated with the IAA nearest to it, so that we stagger the starting global wq for each process. This results in very uniform usage of all IAAs for compress jobs. Two new driver module parameters are added for this feature: g_wqs_per_iaa (default 0): /sys/bus/dsa/drivers/crypto/g_wqs_per_iaa This represents the number of global WQs that can be configured per IAA device. The recommended setting is 1 to enable the use of this feature once the user configures 2 WQs per IAA using higher level scripts as described in Documentation/driver-api/crypto/iaa/iaa-crypto.rst. g_consec_descs_per_gwq (default 1): /sys/bus/dsa/drivers/crypto/g_consec_descs_per_gwq This represents the number of consecutive compress jobs that will be submitted to the same global WQ (i.e. to the same IAA device) from a given core, before moving to the next global WQ. The default is 1, which is also the recommended setting to avail of this feature. The decompress jobs from any core will be sent to the "local" IAA, namely the one that the driver assigns with the cpu_to_iaa() mapping algorithm that evenly balances the assignment of logical cores to IAA devices on a package. On a 2-package Sapphire Rapids server where each package has 56 cores and 4 IAA devices, this is how the compress/decompress jobs will be mapped when the user configures 2 WQs per IAA device (which implies wq.1 will be added to the global WQ table for each logical core on that package): package(s): 2 package0 CPU(s): 0-55,112-167 package1 CPU(s): 56-111,168-223 Compress jobs: -------------- package 0: iaa_crypto will send compress jobs from all cpus (0-55,112-167) to all IAA devices on the package (iax1/iax3/iax5/iax7) in round-robin manner: iaa: iax1 iax3 iax5 iax7 package 1: iaa_crypto will send compress jobs from all cpus (56-111,168-223) to all IAA devices on the package (iax9/iax11/iax13/iax15) in round-robin manner: iaa: iax9 iax11 iax13 iax15 Decompress jobs: ---------------- package 0: cpu 0-13,112-125 14-27,126-139 28-41,140-153 42-55,154-167 iaa: iax1 iax3 iax5 iax7 package 1: cpu 56-69,168-181 70-83,182-195 84-97,196-209 98-111,210-223 iaa: iax9 iax11 iax13 iax15 Signed-off-by: Kanchana P Sridhar --- drivers/crypto/intel/iaa/iaa_crypto.h | 1 + drivers/crypto/intel/iaa/iaa_crypto_main.c | 385 ++++++++++++++++++++- 2 files changed, 378 insertions(+), 8 deletions(-) diff --git a/drivers/crypto/intel/iaa/iaa_crypto.h b/drivers/crypto/intel/iaa/iaa_crypto.h index 72ffdf55f7b3..5f38f530c33d 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto.h +++ b/drivers/crypto/intel/iaa/iaa_crypto.h @@ -91,6 +91,7 @@ struct iaa_device { struct list_head wqs; struct wq_table_entry *iaa_local_wqs; + struct wq_table_entry *iaa_global_wqs; atomic64_t comp_calls; atomic64_t comp_bytes; diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/intel/iaa/iaa_crypto_main.c index 40751d7c83c0..cb96897e7fed 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto_main.c +++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c @@ -42,6 +42,18 @@ static struct crypto_comp *deflate_generic_tfm; /* Per-cpu lookup table for balanced wqs */ static struct wq_table_entry __percpu *wq_table = NULL; +static struct wq_table_entry **pkg_global_wq_tables = NULL; + +/* Per-cpu lookup table for global wqs shared by all cpus. */ +static struct wq_table_entry __percpu *global_wq_table = NULL; + +/* + * Per-cpu counter of consecutive descriptors allocated to + * the same wq in the global_wq_table, so that we know + * when to switch to the next wq in the global_wq_table. + */ +static int __percpu *num_consec_descs_per_wq = NULL; + /* Verify results of IAA compress or not */ static bool iaa_verify_compress = false; @@ -79,6 +91,16 @@ static bool async_mode = true; /* Use interrupts */ static bool use_irq; +/* Number of global wqs per iaa*/ +static int g_wqs_per_iaa = 0; + +/* + * Number of consecutive descriptors to allocate from a + * given global wq before switching to the next wq in + * the global_wq_table. + */ +static int g_consec_descs_per_gwq = 1; + static struct iaa_compression_mode *iaa_compression_modes[IAA_COMP_MODES_MAX]; LIST_HEAD(iaa_devices); @@ -180,6 +202,60 @@ static ssize_t sync_mode_store(struct device_driver *driver, } static DRIVER_ATTR_RW(sync_mode); +static ssize_t g_wqs_per_iaa_show(struct device_driver *driver, char *buf) +{ + return sprintf(buf, "%d\n", g_wqs_per_iaa); +} + +static ssize_t g_wqs_per_iaa_store(struct device_driver *driver, + const char *buf, size_t count) +{ + int ret = -EBUSY; + + mutex_lock(&iaa_devices_lock); + + if (iaa_crypto_enabled) + goto out; + + ret = kstrtoint(buf, 10, &g_wqs_per_iaa); + if (ret) + goto out; + + ret = count; +out: + mutex_unlock(&iaa_devices_lock); + + return ret; +} +static DRIVER_ATTR_RW(g_wqs_per_iaa); + +static ssize_t g_consec_descs_per_gwq_show(struct device_driver *driver, char *buf) +{ + return sprintf(buf, "%d\n", g_consec_descs_per_gwq); +} + +static ssize_t g_consec_descs_per_gwq_store(struct device_driver *driver, + const char *buf, size_t count) +{ + int ret = -EBUSY; + + mutex_lock(&iaa_devices_lock); + + if (iaa_crypto_enabled) + goto out; + + ret = kstrtoint(buf, 10, &g_consec_descs_per_gwq); + if (ret) + goto out; + + ret = count; +out: + mutex_unlock(&iaa_devices_lock); + + return ret; +} +static DRIVER_ATTR_RW(g_consec_descs_per_gwq); + /**************************** * Driver compression modes. ****************************/ @@ -465,7 +541,7 @@ static void remove_device_compression_modes(struct iaa_device *iaa_device) ***********************************************************/ static struct iaa_device *iaa_device_alloc(struct idxd_device *idxd) { - struct wq_table_entry *local; + struct wq_table_entry *local, *global; struct iaa_device *iaa_device; iaa_device = kzalloc(sizeof(*iaa_device), GFP_KERNEL); @@ -488,6 +564,20 @@ static struct iaa_device *iaa_device_alloc(struct idxd_device *idxd) local->max_wqs = iaa_device->idxd->max_wqs; local->n_wqs = 0; + /* IAA device's global wqs. */ + iaa_device->iaa_global_wqs = kzalloc(sizeof(struct wq_table_entry), GFP_KERNEL); + if (!iaa_device->iaa_global_wqs) + goto err; + + global = iaa_device->iaa_global_wqs; + + global->wqs = kzalloc(iaa_device->idxd->max_wqs * sizeof(struct wq *), GFP_KERNEL); + if (!global->wqs) + goto err; + + global->max_wqs = iaa_device->idxd->max_wqs; + global->n_wqs = 0; + INIT_LIST_HEAD(&iaa_device->wqs); return iaa_device; @@ -499,6 +589,8 @@ static struct iaa_device *iaa_device_alloc(struct idxd_device *idxd) kfree(iaa_device->iaa_local_wqs->wqs); kfree(iaa_device->iaa_local_wqs); } + if (iaa_device->iaa_global_wqs) + kfree(iaa_device->iaa_global_wqs); kfree(iaa_device); } @@ -616,6 +708,12 @@ static void free_iaa_device(struct iaa_device *iaa_device) kfree(iaa_device->iaa_local_wqs); } + if (iaa_device->iaa_global_wqs) { + if (iaa_device->iaa_global_wqs->wqs) + kfree(iaa_device->iaa_global_wqs->wqs); + kfree(iaa_device->iaa_global_wqs); + } + kfree(iaa_device); } @@ -817,6 +915,58 @@ static inline int cpu_to_iaa(int cpu) return (nr_iaa - 1); } +static void free_global_wq_table(void) +{ + if (global_wq_table) { + free_percpu(global_wq_table); + global_wq_table = NULL; + } + + if (num_consec_descs_per_wq) { + free_percpu(num_consec_descs_per_wq); + num_consec_descs_per_wq = NULL; + } + + pr_debug("freed global wq table\n"); +} + +static int pkg_global_wq_tables_alloc(void) +{ + int i, j; + + pkg_global_wq_tables = kzalloc(nr_packages * sizeof(*pkg_global_wq_tables), GFP_KERNEL); + if (!pkg_global_wq_tables) + return -ENOMEM; + + for (i = 0; i < nr_packages; ++i) { + pkg_global_wq_tables[i] = kzalloc(sizeof(struct wq_table_entry), GFP_KERNEL); + + if (!pkg_global_wq_tables[i]) { + for (j = 0; j < i; ++j) + kfree(pkg_global_wq_tables[j]); + kfree(pkg_global_wq_tables); + pkg_global_wq_tables = NULL; + return -ENOMEM; + } + pkg_global_wq_tables[i]->wqs = NULL; + } + + return 0; +} + +static void pkg_global_wq_tables_dealloc(void) +{ + int i; + + for (i = 0; i < nr_packages; ++i) { + if (pkg_global_wq_tables[i]->wqs) + kfree(pkg_global_wq_tables[i]->wqs); + kfree(pkg_global_wq_tables[i]); + } + kfree(pkg_global_wq_tables); + pkg_global_wq_tables = NULL; +} + static int alloc_wq_table(int max_wqs) { struct wq_table_entry *entry; @@ -835,6 +985,35 @@ static int alloc_wq_table(int max_wqs) entry->cur_wq = 0; } + global_wq_table = alloc_percpu(struct wq_table_entry); + if (!global_wq_table) + return 0; + + for (cpu = 0; cpu < nr_cpus; cpu++) { + entry = per_cpu_ptr(global_wq_table, cpu); + + entry->wqs = NULL; + entry->max_wqs = max_wqs; + entry->n_wqs = 0; + entry->cur_wq = 0; + } + + num_consec_descs_per_wq = alloc_percpu(int); + if (!num_consec_descs_per_wq) { + free_global_wq_table(); + return 0; + } + + for (cpu = 0; cpu < nr_cpus; cpu++) { + int *num_consec_descs = per_cpu_ptr(num_consec_descs_per_wq, cpu); + *num_consec_descs = 0; + } + + if (pkg_global_wq_tables_alloc()) { + free_global_wq_table(); + return 0; + } + pr_debug("initialized wq table\n"); return 0; @@ -895,13 +1074,120 @@ static int wq_table_add_wqs(int iaa, int cpu) return ret; } +static void pkg_global_wq_tables_reinit(void) +{ + int i, cur_iaa = 0, pkg = 0, nr_pkg_wqs = 0; + struct iaa_device *iaa_device; + struct wq_table_entry *global; + + if (!pkg_global_wq_tables) + return; + + /* Reallocate per-package wqs. */ + list_for_each_entry(iaa_device, &iaa_devices, list) { + global = iaa_device->iaa_global_wqs; + nr_pkg_wqs += global->n_wqs; + + if (++cur_iaa == nr_iaa_per_package) { + nr_pkg_wqs = nr_pkg_wqs ? max_t(int, iaa_device->idxd->max_wqs, nr_pkg_wqs) : 0; + + if (pkg_global_wq_tables[pkg]->wqs) { + kfree(pkg_global_wq_tables[pkg]->wqs); + pkg_global_wq_tables[pkg]->wqs = NULL; + } + + if (nr_pkg_wqs) + pkg_global_wq_tables[pkg]->wqs = kzalloc(nr_pkg_wqs * + sizeof(struct wq *), + GFP_KERNEL); + + pkg_global_wq_tables[pkg]->n_wqs = 0; + pkg_global_wq_tables[pkg]->cur_wq = 0; + pkg_global_wq_tables[pkg]->max_wqs = nr_pkg_wqs; + + if (++pkg == nr_packages) + break; + cur_iaa = 0; + nr_pkg_wqs = 0; + } + } + + pkg = 0; + cur_iaa = 0; + + /* Re-initialize per-package wqs. */ + list_for_each_entry(iaa_device, &iaa_devices, list) { + global = iaa_device->iaa_global_wqs; + + if (pkg_global_wq_tables[pkg]->wqs) + for (i = 0; i < global->n_wqs; ++i) + pkg_global_wq_tables[pkg]->wqs[pkg_global_wq_tables[pkg]->n_wqs++] = global->wqs[i]; + + pr_debug("pkg_global_wq_tables[%d] has %d wqs", pkg, pkg_global_wq_tables[pkg]->n_wqs); + + if (++cur_iaa == nr_iaa_per_package) { + if (++pkg == nr_packages) + break; + cur_iaa = 0; + } + } +} + +static void global_wq_table_add(int cpu, struct wq_table_entry *pkg_global_wq_table) +{ + struct wq_table_entry *entry = per_cpu_ptr(global_wq_table, cpu); + + /* This could be NULL. */ + entry->wqs = pkg_global_wq_table->wqs; + entry->max_wqs = pkg_global_wq_table->max_wqs; + entry->n_wqs = pkg_global_wq_table->n_wqs; + entry->cur_wq = 0; + + if (entry->wqs) + pr_debug("%s: cpu %d: added %d iaa global wqs up to wq %d.%d\n", __func__, + cpu, entry->n_wqs, + entry->wqs[entry->n_wqs - 1]->idxd->id, + entry->wqs[entry->n_wqs - 1]->id); +} + +static void global_wq_table_set_start_wq(int cpu) +{ + struct wq_table_entry *entry = per_cpu_ptr(global_wq_table, cpu); + int start_wq = g_wqs_per_iaa * (cpu_to_iaa(cpu) % nr_iaa_per_package); + + if ((start_wq >= 0) && (start_wq < entry->n_wqs)) + entry->cur_wq = start_wq; +} + +static void global_wq_table_add_wqs(void) +{ + int cpu; + + if (!pkg_global_wq_tables) + return; + + for (cpu = 0; cpu < nr_cpus; cpu += nr_cpus_per_package) { + /* cpu's on the same package get the same global_wq_table. */ + int package_id = topology_logical_package_id(cpu); + int pkg_cpu; + + for (pkg_cpu = cpu; pkg_cpu < cpu + nr_cpus_per_package; ++pkg_cpu) { + if (pkg_global_wq_tables[package_id]->n_wqs > 0) { + global_wq_table_add(pkg_cpu, pkg_global_wq_tables[package_id]); + global_wq_table_set_start_wq(pkg_cpu); + } + } + } +} + static int map_iaa_device_wqs(struct iaa_device *iaa_device) { - struct wq_table_entry *local; + struct wq_table_entry *local, *global; int ret = 0, n_wqs_added = 0; struct iaa_wq *iaa_wq; local = iaa_device->iaa_local_wqs; + global = iaa_device->iaa_global_wqs; list_for_each_entry(iaa_wq, &iaa_device->wqs, list) { if (iaa_wq->mapped && ++n_wqs_added) @@ -909,11 +1195,18 @@ static int map_iaa_device_wqs(struct iaa_device *iaa_device) pr_debug("iaa_device %px: processing wq %d.%d\n", iaa_device, iaa_device->idxd->id, iaa_wq->wq->id); - if (WARN_ON(local->n_wqs == local->max_wqs)) - break; + if ((!n_wqs_added || ((n_wqs_added + g_wqs_per_iaa) < iaa_device->n_wq)) && + (local->n_wqs < local->max_wqs)) { + + local->wqs[local->n_wqs++] = iaa_wq->wq; + pr_debug("iaa_device %px: added local wq %d.%d\n", iaa_device, iaa_device->idxd->id, iaa_wq->wq->id); + } else { + if (WARN_ON(global->n_wqs == global->max_wqs)) + break; - local->wqs[local->n_wqs++] = iaa_wq->wq; - pr_debug("iaa_device %px: added local wq %d.%d\n", iaa_device, iaa_device->idxd->id, iaa_wq->wq->id); + global->wqs[global->n_wqs++] = iaa_wq->wq; + pr_debug("iaa_device %px: added global wq %d.%d\n", iaa_device, iaa_device->idxd->id, iaa_wq->wq->id); + } iaa_wq->mapped = true; ++n_wqs_added; @@ -969,6 +1262,10 @@ static void rebalance_wq_table(void) } } + if (iaa_crypto_enabled && pkg_global_wq_tables) { + pkg_global_wq_tables_reinit(); + global_wq_table_add_wqs(); + } pr_debug("Finished rebalance local wqs."); } @@ -979,7 +1276,17 @@ static void free_wq_tables(void) wq_table = NULL; } - pr_debug("freed local wq table\n"); + if (global_wq_table) { + free_percpu(global_wq_table); + global_wq_table = NULL; + } + + if (num_consec_descs_per_wq) { + free_percpu(num_consec_descs_per_wq); + num_consec_descs_per_wq = NULL; + } + + pr_debug("freed wq tables\n"); } /*************************************************************** @@ -1002,6 +1309,35 @@ static struct idxd_wq *wq_table_next_wq(int cpu) return entry->wqs[entry->cur_wq]; } +/* + * Caller should make sure to call only if the + * per_cpu_ptr "global_wq_table" is non-NULL + * and has at least one wq configured. + */ +static struct idxd_wq *global_wq_table_next_wq(int cpu) +{ + struct wq_table_entry *entry = per_cpu_ptr(global_wq_table, cpu); + int *num_consec_descs = per_cpu_ptr(num_consec_descs_per_wq, cpu); + + /* + * Fall-back to local IAA's wq if there were no global wqs configured + * for any IAA device, or if there were problems in setting up global + * wqs for this cpu's package. + */ + if (!entry->wqs) + return wq_table_next_wq(cpu); + + if ((*num_consec_descs) == g_consec_descs_per_gwq) { + if (++entry->cur_wq >= entry->n_wqs) + entry->cur_wq = 0; + *num_consec_descs = 0; + } + + ++(*num_consec_descs); + + return entry->wqs[entry->cur_wq]; +} + /************************************************* * Core iaa_crypto compress/decompress functions. *************************************************/ @@ -1563,6 +1899,7 @@ static int iaa_comp_acompress(struct acomp_req *req) struct idxd_wq *wq; struct device *dev; int order = -1; + struct wq_table_entry *entry; compression_ctx = crypto_tfm_ctx(tfm); @@ -1581,8 +1918,15 @@ static int iaa_comp_acompress(struct acomp_req *req) disable_async = true; cpu = get_cpu(); - wq = wq_table_next_wq(cpu); + entry = per_cpu_ptr(global_wq_table, cpu); + + if (!entry || !entry->wqs || entry->n_wqs == 0) { + wq = wq_table_next_wq(cpu); + } else { + wq = global_wq_table_next_wq(cpu); + } put_cpu(); + if (!wq) { pr_debug("no wq configured for cpu=%d\n", cpu); return -ENODEV; @@ -2233,6 +2577,7 @@ static void iaa_crypto_remove(struct idxd_dev *idxd_dev) if (nr_iaa == 0) { iaa_crypto_enabled = false; + pkg_global_wq_tables_dealloc(); free_wq_tables(); BUG_ON(!list_empty(&iaa_devices)); INIT_LIST_HEAD(&iaa_devices); @@ -2302,6 +2647,20 @@ static int __init iaa_crypto_init_module(void) goto err_sync_attr_create; } + ret = driver_create_file(&iaa_crypto_driver.drv, + &driver_attr_g_wqs_per_iaa); + if (ret) { + pr_debug("IAA g_wqs_per_iaa attr creation failed\n"); + goto err_g_wqs_per_iaa_attr_create; + } + + ret = driver_create_file(&iaa_crypto_driver.drv, + &driver_attr_g_consec_descs_per_gwq); + if (ret) { + pr_debug("IAA g_consec_descs_per_gwq attr creation failed\n"); + goto err_g_consec_descs_per_gwq_attr_create; + } + if (iaa_crypto_debugfs_init()) pr_warn("debugfs init failed, stats not available\n"); @@ -2309,6 +2668,12 @@ static int __init iaa_crypto_init_module(void) out: return ret; +err_g_consec_descs_per_gwq_attr_create: + driver_remove_file(&iaa_crypto_driver.drv, + &driver_attr_g_wqs_per_iaa); +err_g_wqs_per_iaa_attr_create: + driver_remove_file(&iaa_crypto_driver.drv, + &driver_attr_sync_mode); err_sync_attr_create: driver_remove_file(&iaa_crypto_driver.drv, &driver_attr_verify_compress); @@ -2332,6 +2697,10 @@ static void __exit iaa_crypto_cleanup_module(void) &driver_attr_sync_mode); driver_remove_file(&iaa_crypto_driver.drv, &driver_attr_verify_compress); + driver_remove_file(&iaa_crypto_driver.drv, + &driver_attr_g_wqs_per_iaa); + driver_remove_file(&iaa_crypto_driver.drv, + &driver_attr_g_consec_descs_per_gwq); idxd_driver_unregister(&iaa_crypto_driver); iaa_aecs_cleanup_fixed(); crypto_free_comp(deflate_generic_tfm); From patchwork Fri Feb 28 10:00:20 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Sridhar, Kanchana P" X-Patchwork-Id: 869380 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D1D4525D552; Fri, 28 Feb 2025 10:00:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740736839; cv=none; b=fiieKFTddkF9CuihMyxl2DA6fT2JHg93bS7VxWN4wsGgtpLdZ+rwbw75p8TcWqF9TD1Wux/ab1aETSX5hWTPRyrHwzzhR6xgcIJybRx8HxZwWjANyCwRo0/57d2GLVZay6LZIQV4eED4YglioK+VF4IvT3/RX+KPS2CA+gqQIC0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740736839; c=relaxed/simple; bh=nu1yNk8S8EgSrO+ZHhQ4Hl0iMXbUsnYxF38CaLmoAAI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=czy/S9EFA8ZgGvlFuZSv67VXn+2PDWpypPCDT+rkwEsaqKJb/ceU/FXgYmDclpJFdZRm42fvt0BKJZKlvIsMy8rFAmzgWeAlzw6J8hfYz2Dy1UqW6iySQyF3aDc+UsmKoS/AR71dAB3IuUvi0JQ1hxD6u5GRWPhVGWpzn4jh+sI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=eHolsIYi; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="eHolsIYi" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740736837; x=1772272837; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nu1yNk8S8EgSrO+ZHhQ4Hl0iMXbUsnYxF38CaLmoAAI=; b=eHolsIYiOy6nQHpk+7mu20LXrrKt4lUEMXzGicKuKhuQpmVBot2Ql9Ho NoEYQ6iZ0lm2nNDrCchR20Tp6ktm/sReFCXYU5HIPo7FAkZKwlD7EsOqa Dp2P5XGVgZGh1UAuQpNEFPw6rWQtlHGu48wrXjEGxuOULB2e4v+uJAIwj qMXK/pZIH6KG8k4L+YZhdBbZPskY9cpdnaE0h8XoOaa5xA687vUfiLHdf kAEPFclUdwuW5qDhS7WE/6bIYTdzTzAlegy2jJpmUzadrZIJFFNqcF/nb TS7O8KDPjictWZ5hcFsaSZ6BsxI7ZYWj1TjGL9F2Mh9pSnrzsQil4y8Yp w==; X-CSE-ConnectionGUID: f59G+UgqS6yuwfWh7ga81g== X-CSE-MsgGUID: G7kFA7mES1iff4+WXARkkg== X-IronPort-AV: E=McAfee;i="6700,10204,11358"; a="40902693" X-IronPort-AV: E=Sophos;i="6.13,322,1732608000"; d="scan'208";a="40902693" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Feb 2025 02:00:28 -0800 X-CSE-ConnectionGUID: oQnOQUszQLSG6ECXKMtNow== X-CSE-MsgGUID: yhZ+k6hdQkGkDfWwMFVwbA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,322,1732608000"; d="scan'208";a="117325741" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by orviesa006.jf.intel.com with ESMTP; 28 Feb 2025 02:00:28 -0800 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, ying.huang@linux.alibaba.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v7 11/15] crypto: iaa - Fix for "deflate_generic_tfm" global being accessed without locks. Date: Fri, 28 Feb 2025 02:00:20 -0800 Message-Id: <20250228100024.332528-12-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20250228100024.332528-1-kanchana.p.sridhar@intel.com> References: <20250228100024.332528-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The mainline implementation of "deflate_generic_decompress" has a bug in the usage of this global variable: static struct crypto_comp *deflate_generic_tfm; The "deflate_generic_tfm" is allocated at module init time, and freed during module cleanup. Any calls to software decompress, for instance, if descriptor allocation fails or job submission fails, will trigger this bug in the deflate_generic_decompress() procedure. The problem is the unprotected access of "deflate_generic_tfm" in this procedure. While stress testing workloads under high memory pressure, with 1 IAA device and "deflate-iaa" as the compressor, the descriptor allocation times out and the software fallback route is taken. With multiple processes calling: ret = crypto_comp_decompress(deflate_generic_tfm, src, req->slen, dst, &req->dlen); we end up with data corruption, that results in req->dlen being larger than PAGE_SIZE. zswap_decompress() subsequently raises a kernel bug. This bug can manifest under high contention and memory pressure situations with high likelihood. This has been resolved by adding a mutex, which is locked before accessing "deflate_generic_tfm" and unlocked after the crypto_comp call is done. Signed-off-by: Kanchana P Sridhar --- drivers/crypto/intel/iaa/iaa_crypto_main.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/crypto/intel/iaa/iaa_crypto_main.c b/drivers/crypto/intel/iaa/iaa_crypto_main.c index 7503fafca279..2a994f307679 100644 --- a/drivers/crypto/intel/iaa/iaa_crypto_main.c +++ b/drivers/crypto/intel/iaa/iaa_crypto_main.c @@ -105,6 +105,7 @@ static struct iaa_compression_mode *iaa_compression_modes[IAA_COMP_MODES_MAX]; LIST_HEAD(iaa_devices); DEFINE_MUTEX(iaa_devices_lock); +DEFINE_MUTEX(deflate_generic_tfm_lock); /* If enabled, IAA hw crypto algos are registered, unavailable otherwise */ static bool iaa_crypto_enabled; @@ -1407,6 +1408,9 @@ static int deflate_generic_decompress(struct acomp_req *req) int ret; req->dlen = PAGE_SIZE; + + mutex_lock(&deflate_generic_tfm_lock); + src = kmap_local_page(sg_page(req->src)) + req->src->offset; dst = kmap_local_page(sg_page(req->dst)) + req->dst->offset; @@ -1416,6 +1420,8 @@ static int deflate_generic_decompress(struct acomp_req *req) kunmap_local(src); kunmap_local(dst); + mutex_unlock(&deflate_generic_tfm_lock); + update_total_sw_decomp_calls(); return ret; From patchwork Fri Feb 28 10:00:23 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Sridhar, Kanchana P" X-Patchwork-Id: 869378 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 45E0F25DD09; Fri, 28 Feb 2025 10:00:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740736842; cv=none; b=OTuYbNuLMo84Rp/VvtKnH0PE/r2U0IyJdU4Bniq5A0yG7smqa2/zCACWnezyLnLFWqfvkT01Sm+NXxVYbB5buloRqef++FcwouFMnuZ+ECy2jY8t1eOUPRQsY9Marz93OgyhEAvKs7XnSxAh0EG+AtqqI6OzfPV18qY/D0wlGp4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740736842; c=relaxed/simple; bh=r6PEYgxq6/N6uxBAD+xcepNUnGuvoW5pJWNnSNPI1s4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=JApudlcR2Yapa3W+4xfqq2yuDyCGfwrplyJSMTXRrAYD4urcmUPlejbmp9JFb2mwYbEY3lQhdJqhSGit0XE5BXNpVO7vYyxYhlcL8/nbuWtbC/C7C92oAbqVFShQ49bmTFckgdzjMzg5Ag3joMeijjZaZ4udeAuyjoGmCXAjiWQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=DmVw/xV1; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="DmVw/xV1" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740736840; x=1772272840; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=r6PEYgxq6/N6uxBAD+xcepNUnGuvoW5pJWNnSNPI1s4=; b=DmVw/xV1Vtcs8dVgSxoUFaiQEQdP5O9tXzvSzhe7smZVYoIlO5ITOrbE f3oKslwwQGv9rFOjp0IUD+NpEXdpk9DXOuHUBdWmUF7Wox7x+x34q+SLx /oILnrrJkcZZSRPqHdgU2ZpFidWf2WjQxRcQiaeRrFc9yECM8U3BZQw6H Xyq5T1/Nyv1Ywr227Es6xjlS/M9AbQsOXxy/cngLMgC5lfmVVbCJ9tohD D0v4AXvUDPlaIBo+f6IBlaUI9yYoOS817blIDljLoRx0twSlDjifJGmda /KZtoyHsQffcn5UhRcmsErEyy0lhvaRWXLdOS0dwuOmScyYNiU7AAp3yM g==; X-CSE-ConnectionGUID: OkFdMWBlTsyhhDKMPSL3wQ== X-CSE-MsgGUID: PCI5NR04QleS/T7u0fLQww== X-IronPort-AV: E=McAfee;i="6700,10204,11358"; a="40902736" X-IronPort-AV: E=Sophos;i="6.13,322,1732608000"; d="scan'208";a="40902736" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Feb 2025 02:00:29 -0800 X-CSE-ConnectionGUID: qKe0qnEyQwu6amG26g9mBA== X-CSE-MsgGUID: mm6IdXK6SA27wiKDPmIP8w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,322,1732608000"; d="scan'208";a="117325757" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by orviesa006.jf.intel.com with ESMTP; 28 Feb 2025 02:00:29 -0800 From: Kanchana P Sridhar To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, ying.huang@linux.alibaba.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v7 14/15] mm: zswap: Restructure & simplify zswap_store() to make it amenable for batching. Date: Fri, 28 Feb 2025 02:00:23 -0800 Message-Id: <20250228100024.332528-15-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20250228100024.332528-1-kanchana.p.sridhar@intel.com> References: <20250228100024.332528-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This patch introduces zswap_store_folio() that implements all the computes done earlier in zswap_store_page() for a single-page, for all the pages in a folio. This allows us to move the loop over the folio's pages from zswap_store() to zswap_store_folio(). zswap_store_folio() starts by allocating all zswap entries required to store the folio. Next, it iterates over the folio's pages, and for each page, it calls zswap_compress(), adds the zswap entry to the xarray and LRU, charges zswap memory and increments zswap stats. The error handling and cleanup required for all failure scenarios that can occur while storing a folio in zswap are now consolidated to a "store_folio_failed" label in zswap_store_folio(). These changes facilitate developing support for compress batching in zswap_store_folio(). Signed-off-by: Kanchana P Sridhar --- mm/zswap.c | 166 +++++++++++++++++++++++++++++++---------------------- 1 file changed, 98 insertions(+), 68 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 6aa602b8514e..ab9167220cb6 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1580,81 +1580,115 @@ static void shrink_worker(struct work_struct *w) * main API **********************************/ -static bool zswap_store_page(struct page *page, - struct obj_cgroup *objcg, - struct zswap_pool *pool) +/* + * Store all pages in a folio. + * + * The error handling from all failure points is consolidated to the + * "store_folio_failed" label, based on the initialization of the zswap entries' + * handles to ERR_PTR(-EINVAL) at allocation time, and the fact that the + * entry's handle is subsequently modified only upon a successful zpool_malloc() + * after the page is compressed. + */ +static bool zswap_store_folio(struct folio *folio, + struct obj_cgroup *objcg, + struct zswap_pool *pool) { - swp_entry_t page_swpentry = page_swap_entry(page); - struct zswap_entry *entry, *old; + long index, from_index = 0, nr_pages = folio_nr_pages(folio); + struct zswap_entry **entries = NULL; + int node_id = folio_nid(folio); - /* allocate entry */ - entry = zswap_entry_cache_alloc(GFP_KERNEL, page_to_nid(page)); - if (!entry) { - zswap_reject_kmemcache_fail++; + entries = kmalloc(nr_pages * sizeof(*entries), GFP_KERNEL); + if (!entries) return false; - } - if (!zswap_compress(page, entry, pool)) - goto compress_failed; + for (index = from_index; index < nr_pages; ++index) { + entries[index] = zswap_entry_cache_alloc(GFP_KERNEL, node_id); - old = xa_store(swap_zswap_tree(page_swpentry), - swp_offset(page_swpentry), - entry, GFP_KERNEL); - if (xa_is_err(old)) { - int err = xa_err(old); + if (!entries[index]) { + zswap_reject_kmemcache_fail++; + nr_pages = index; + goto store_folio_failed; + } - WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n", err); - zswap_reject_alloc_fail++; - goto store_failed; + entries[index]->handle = (unsigned long)ERR_PTR(-EINVAL); } - /* - * We may have had an existing entry that became stale when - * the folio was redirtied and now the new version is being - * swapped out. Get rid of the old. - */ - if (old) - zswap_entry_free(old); + for (index = from_index; index < nr_pages; ++index) { + struct page *page = folio_page(folio, index); + swp_entry_t page_swpentry = page_swap_entry(page); + struct zswap_entry *old, *entry = entries[index]; - /* - * The entry is successfully compressed and stored in the tree, there is - * no further possibility of failure. Grab refs to the pool and objcg, - * charge zswap memory, and increment zswap_stored_pages. - * The opposite actions will be performed by zswap_entry_free() - * when the entry is removed from the tree. - */ - zswap_pool_get(pool); - if (objcg) { - obj_cgroup_get(objcg); - obj_cgroup_charge_zswap(objcg, entry->length); - } - atomic_long_inc(&zswap_stored_pages); + if (!zswap_compress(page, entry, pool)) { + from_index = index; + goto store_folio_failed; + } - /* - * We finish initializing the entry while it's already in xarray. - * This is safe because: - * - * 1. Concurrent stores and invalidations are excluded by folio lock. - * - * 2. Writeback is excluded by the entry not being on the LRU yet. - * The publishing order matters to prevent writeback from seeing - * an incoherent entry. - */ - entry->pool = pool; - entry->swpentry = page_swpentry; - entry->objcg = objcg; - entry->referenced = true; - if (entry->length) { - INIT_LIST_HEAD(&entry->lru); - zswap_lru_add(&zswap_list_lru, entry); + old = xa_store(swap_zswap_tree(page_swpentry), + swp_offset(page_swpentry), + entry, GFP_KERNEL); + if (xa_is_err(old)) { + int err = xa_err(old); + + WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n", err); + zswap_reject_alloc_fail++; + from_index = index; + goto store_folio_failed; + } + + /* + * We may have had an existing entry that became stale when + * the folio was redirtied and now the new version is being + * swapped out. Get rid of the old. + */ + if (old) + zswap_entry_free(old); + + /* + * The entry is successfully compressed and stored in the tree, there is + * no further possibility of failure. Grab refs to the pool and objcg, + * charge zswap memory, and increment zswap_stored_pages. + * The opposite actions will be performed by zswap_entry_free() + * when the entry is removed from the tree. + */ + zswap_pool_get(pool); + if (objcg) { + obj_cgroup_get(objcg); + obj_cgroup_charge_zswap(objcg, entry->length); + } + atomic_long_inc(&zswap_stored_pages); + + /* + * We finish initializing the entry while it's already in xarray. + * This is safe because: + * + * 1. Concurrent stores and invalidations are excluded by folio lock. + * + * 2. Writeback is excluded by the entry not being on the LRU yet. + * The publishing order matters to prevent writeback from seeing + * an incoherent entry. + */ + entry->pool = pool; + entry->swpentry = page_swpentry; + entry->objcg = objcg; + entry->referenced = true; + if (entry->length) { + INIT_LIST_HEAD(&entry->lru); + zswap_lru_add(&zswap_list_lru, entry); + } } + kfree(entries); return true; -store_failed: - zpool_free(pool->zpool, entry->handle); -compress_failed: - zswap_entry_cache_free(entry); +store_folio_failed: + for (index = from_index; index < nr_pages; ++index) { + if (!IS_ERR_VALUE(entries[index]->handle)) + zpool_free(pool->zpool, entries[index]->handle); + + zswap_entry_cache_free(entries[index]); + } + + kfree(entries); return false; } @@ -1666,7 +1700,6 @@ bool zswap_store(struct folio *folio) struct mem_cgroup *memcg = NULL; struct zswap_pool *pool; bool ret = false; - long index; VM_WARN_ON_ONCE(!folio_test_locked(folio)); VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); @@ -1700,12 +1733,8 @@ bool zswap_store(struct folio *folio) mem_cgroup_put(memcg); } - for (index = 0; index < nr_pages; ++index) { - struct page *page = folio_page(folio, index); - - if (!zswap_store_page(page, objcg, pool)) - goto put_pool; - } + if (!zswap_store_folio(folio, objcg, pool)) + goto put_pool; if (objcg) count_objcg_events(objcg, ZSWPOUT, nr_pages); @@ -1732,6 +1761,7 @@ bool zswap_store(struct folio *folio) pgoff_t offset = swp_offset(swp); struct zswap_entry *entry; struct xarray *tree; + long index; for (index = 0; index < nr_pages; ++index) { tree = swap_zswap_tree(swp_entry(type, offset + index));