[v9,15/19] mm: zswap: Per-CPU acomp_ctx resources exist from pool creation to deletion.

This patch simplifies the zswap_pool's per-CPU acomp_ctx resource
management. Similar to the per-CPU acomp_ctx itself, the per-CPU
acomp_ctx's resources' (acomp, ref, buffer) lifetime will also be from
pool creation to pool deletion. These resources will persist through CPU
hotplug operations. The zswap_cpu_comp_dead() teardown callback has been
deleted from the call to
cpuhp_setup_state_multi(CPUHP_MM_ZSWP_POOL_PREPARE). As a result, CPU
offline hotplug operations will be no-ops as far as the acomp_ctx
resources are concerned.

The main benefit of using the CPU hotplug multi state instance startup
callback to allocate the acomp_ctx resources is that it prevents the
cores from being offlined until the multi state instance addition call
returns.

  From Documentation/core-api/cpu_hotplug.rst:

    "The node list add/remove operations and the callback invocations are
     serialized against CPU hotplug operations."

Furthermore, zswap_[de]compress() cannot contend with
zswap_cpu_comp_prepare() because:

  - During pool creation/deletion, the pool is not in the zswap_pools
    list.

  - During CPU hot[un]plug, the CPU is not yet online, as Yosry pointed
    out. zswap_cpu_comp_prepare() will be executed on a control CPU,
    since CPUHP_MM_ZSWP_POOL_PREPARE is in the PREPARE section of "enum
    cpuhp_state". Thanks Yosry for sharing this observation!

  In both these cases, any recursions into zswap reclaim from
  zswap_cpu_comp_prepare() will be handled by the old pool.

The above two observations enable the following simplifications:

 1) zswap_cpu_comp_prepare(): CPU cannot be offlined. Reclaim cannot use
    the pool. Considerations for mutex init/locking and handling
    subsequent CPU hotplug online-offlines:

    Should we lock the mutex of current CPU's acomp_ctx from start to
    end? It doesn't seem like this is required. The CPU hotplug
    operations acquire a "cpuhp_state_mutex" before proceeding, hence
    they are serialized against CPU hotplug operations.

    If the process gets migrated while zswap_cpu_comp_prepare() is
    running, it will complete on the new CPU. In case of failures, we
    pass the acomp_ctx pointer obtained at the start of
    zswap_cpu_comp_prepare() to acomp_ctx_dealloc(), which again, can
    only undergo migration. There appear to be no contention scenarios
    that might cause inconsistent values of acomp_ctx's members. Hence,
    it seems there is no need for mutex_lock(&acomp_ctx->mutex) in
    zswap_cpu_comp_prepare().

    Since the pool is not yet on zswap_pools list, we don't need to
    initialize the per-CPU acomp_ctx mutex in zswap_pool_create(). This
    has been restored to occur in zswap_cpu_comp_prepare().

    zswap_cpu_comp_prepare() checks upfront if acomp_ctx->acomp is
    valid. If so, it returns success. This should handle any CPU
    hotplug online-offline transitions after pool creation is done.

 2) CPU offline vis-a-vis zswap ops: Let's suppose the process is
    migrated to another CPU before the current CPU is dysfunctional. If
    zswap_[de]compress() holds the acomp_ctx->mutex lock of the offlined
    CPU, that mutex will be released once it completes on the new
    CPU. Since there is no teardown callback, there is no possibility of
    UAF.

 3) Pool creation/deletion and process migration to another CPU:

    - During pool creation/deletion, the pool is not in the zswap_pools
      list. Hence it cannot contend with zswap ops on that CPU. However,
      the process can get migrated.

      Pool creation --> zswap_cpu_comp_prepare()
                                --> process migrated:
                                    * CPU offline: no-op.
                                    * zswap_cpu_comp_prepare() continues
                                      to run on the new CPU to finish
                                      allocating acomp_ctx resources for
                                      the offlined CPU.

      Pool deletion --> acomp_ctx_dealloc()
                                --> process migrated:
                                    * CPU offline: no-op.
                                    * acomp_ctx_dealloc() continues
                                      to run on the new CPU to finish
                                      de-allocating acomp_ctx resources
                                      for the offlined CPU.

 4) Pool deletion vis-a-vis CPU onlining:
    To prevent possibility of race conditions between
    acomp_ctx_dealloc() freeing the acomp_ctx resources and the initial
    check for a valid acomp_ctx->acomp in zswap_cpu_comp_prepare(), we
    need to delete the multi state instance right after it is added, in
    zswap_pool_create().

 Summary of changes based on the above:
 --------------------------------------
 1) Zero-initialization of pool->acomp_ctx in zswap_pool_create() to
    simplify and share common code for different error handling/cleanup
    related to the acomp_ctx.

 2) Remove the node list instance right after node list add function
    call in zswap_pool_create(). This prevents race conditions between
    CPU onlining after initial pool creation, and acomp_ctx_dealloc()
    freeing the acomp_ctx resources.

 3) zswap_pool_destroy() will call acomp_ctx_dealloc() to de-allocate
    the per-CPU acomp_ctx resources.

 4) Changes to zswap_cpu_comp_prepare():

    a) Check if acomp_ctx->acomp is valid at the beginning and return,
       because the acomp_ctx is already initialized.
    b) Move the mutex_init to happen in this procedure, before it
       returns.
    c) All error conditions handled by calling acomp_ctx_dealloc().

 5) New procedure acomp_ctx_dealloc() for common error/cleanup code.

 6) No more multi state instance teardown callback. CPU offlining is a
    no-op as far as acomp_ctx resources are concerned.

 7) Delete acomp_ctx_get_cpu_lock()/acomp_ctx_put_unlock(). Directly
    call mutex_lock(&acomp_ctx->mutex)/mutex_unlock(&acomp_ctx->mutex)
    in zswap_[de]compress().

The per-CPU memory cost of not deleting the acomp_ctx resources upon CPU
offlining, and only deleting them when the pool is destroyed, is as
follows, on x86_64:

    IAA with batching: 64.8 KB
    Software compressors: 8.2 KB

Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
---
 mm/zswap.c | 193 +++++++++++++++++++++++++----------------------------
 1 file changed, 92 insertions(+), 101 deletions(-)

Message ID	20250430205305.22844-16-kanchana.p.sridhar@intel.com
State	Superseded
Headers	show Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5AABF2D26B3; Wed, 30 Apr 2025 20:53:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746046404; cv=none; b=aoB6MYZi6Chvnt2UTq3HhVv/W6Javo+xWd7yagz5QasDN4V/rVTi7tf9HXDtSDNKhaV5O3hQnCNMUrnfBM61uE9FmVsw5NTaxaSMrsF7XkRPS/vpertlsSDxuA6bsNIZ+POsRuyK3Pr7yvOq2JXPd4VcKSkIadJMGp13gGRFSsY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746046404; c=relaxed/simple; bh=AvunHsRrxVpuS1Dp2nlscgwVBY2zU8yiTAXdQyduiSQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=fBfTTui4FVgjTU2Kpfa4HuurTBT0YYyUT2pyi354TQ+T3m3ydBGUbeYqhaYiFIdfvDjZLVIL9djMPvXNuyDfuRHRSpn1JwS66Kb6S0pTs7NZOIQxfNR5xYGf/vgXVD7TYg11Nyccz+3v6ifPSOaMTWmW/uS2hLifDdpmGT9/jaE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=PQlb5FOz; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="PQlb5FOz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1746046402; x=1777582402; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=AvunHsRrxVpuS1Dp2nlscgwVBY2zU8yiTAXdQyduiSQ=; b=PQlb5FOz66TYoleZeolSf7f2KRalF9SiH98PQ/slULwYyTOZ7uX/gUH8 T1cRJOwDTR3Sz6o/dn6kk1zSnEFTT/L0VfKzh+FgOt1IpgZ8ZrXcexdJl NQi1M60JAyfWXNaRO9Jfqvw1iNAGnrK8cHomsOJW5nA8IANs371+qv4/e V87quvzegM/9iznoo7xyGtYqQhE5SwPgo+Tw0hS2gREYvxZsVbZu7qu1D KJLNkOaFG6QbfSoHpccJcqQp1SMULaGgnfO0f90MXja1lW16NStfzKbK0 Vyx6EvxYViu/u4Q79vPRMGS8posz2ndZcdH3mQNwn+BH4V88wHahDibqW A==; X-CSE-ConnectionGUID: PqNY17ZKTNaSwq379B7Y4w== X-CSE-MsgGUID: qwrwRSPCSpCHPNUmYRE1/Q== X-IronPort-AV: E=McAfee;i="6700,10204,11419"; a="51388687" X-IronPort-AV: E=Sophos;i="6.15,252,1739865600"; d="scan'208";a="51388687" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Apr 2025 13:53:20 -0700 X-CSE-ConnectionGUID: xwuqEKmXSaOdIYE2dNDueQ== X-CSE-MsgGUID: An8Du/lMTsqTa+0zQDAPhQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,252,1739865600"; d="scan'208";a="138248935" Received: from jf5300-b11a338t.jf.intel.com ([10.242.51.115]) by fmviesa003.fm.intel.com with ESMTP; 30 Apr 2025 13:53:19 -0700 From: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, usamaarif642@gmail.com, ryan.roberts@arm.com, 21cnbao@gmail.com, ying.huang@linux.alibaba.com, akpm@linux-foundation.org, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au, davem@davemloft.net, clabbe@baylibre.com, ardb@kernel.org, ebiggers@google.com, surenb@google.com, kristen.c.accardi@intel.com Cc: wajdi.k.feghali@intel.com, vinodh.gopal@intel.com, kanchana.p.sridhar@intel.com Subject: [PATCH v9 15/19] mm: zswap: Per-CPU acomp_ctx resources exist from pool creation to deletion. Date: Wed, 30 Apr 2025 13:53:01 -0700 Message-Id: <20250430205305.22844-16-kanchana.p.sridhar@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20250430205305.22844-1-kanchana.p.sridhar@intel.com> References: <20250430205305.22844-1-kanchana.p.sridhar@intel.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: <linux-crypto.vger.kernel.org> List-Subscribe: <mailto:linux-crypto+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-crypto+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	zswap compression batching \| expand [v9,00/19] zswap compression batching [v9,01/19] crypto: acomp - Remove request chaining [v9,02/19] crypto: acomp - Reinstate non-chained crypto_acomp_[de]compress(). [v9,03/19] Revert "crypto: testmgr - Add multibuffer acomp testing" [v9,04/19] crypto: scomp - Fix off-by-one bug when calculating last page [v9,05/19] crypto: iaa - Re-organize the iaa_crypto driver code. [v9,06/19] crypto: iaa - New architecture for IAA device WQ comp/decomp usage & core mapping. [v9,07/19] crypto: iaa - Define and use req->data instead of req->base.data. [v9,08/19] crypto: iaa - Descriptor allocation timeouts with mitigations in iaa_crypto. [v9,09/19] crypto: iaa - CRYPTO_ACOMP_REQ_POLL acomp_req flag for sequential vs. parallel. [v9,10/19] crypto: acomp - New interfaces to facilitate batching support in acomp & drivers. [v9,11/19] crypto: iaa - Implement crypto_acomp batching interfaces for Intel IAA. [v9,12/19] crypto: iaa - Enable async mode and make it the default. [v9,13/19] crypto: iaa - Disable iaa_verify_compress by default. [v9,14/19] mm: zswap: Move the CPU hotplug procedures under "pool functions". [v9,15/19] mm: zswap: Per-CPU acomp_ctx resources exist from pool creation to deletion. [v9,16/19] mm: zswap: Consistently use IS_ERR_OR_NULL() to check acomp_ctx resources. [v9,17/19] mm: zswap: Allocate pool batching resources if the compressor supports batching. [v9,18/19] mm: zswap: zswap_store() will process a folio in batches. [v9,19/19] mm: zswap: Batched zswap_compress() with compress batching of large folios.

[v9,15/19] mm: zswap: Per-CPU acomp_ctx resources exist from pool creation to deletion.

Commit Message

Patch