From patchwork Fri Feb 10 09:05:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 652415 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D345C636D3 for ; Fri, 10 Feb 2023 09:06:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231883AbjBJJGZ (ORCPT ); Fri, 10 Feb 2023 04:06:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39168 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231219AbjBJJF6 (ORCPT ); Fri, 10 Feb 2023 04:05:58 -0500 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 26F9163585; Fri, 10 Feb 2023 01:05:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1676019929; x=1707555929; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0DbfECaRxiOQUgGeyWMl1ZfzEksrLbe++d0Gmjbv5io=; b=UWFGI2G9E/Z82lZhr7Rdg0UrxhlNk2JoBVMi1vRsNnewdRK/BOxv5zHW DuN1IqbcSl39gOrxmZpgHr4ByB8wKeYnMU3L9fM/oduvdUTpkV4CCE6pR HYo2r7+RFx9jER1h2ZE+TzFRBre9bAkPCVp9vtjby8t4z9pKIogAEECWm Qyyy1OB2CRYfLbppTO5/u19bojnddC6XuyNrioqpRly9p4ricwGhaUQu1 zfS4B4/cFxVpJ4rC/d+F7M64NTQ5JHfBbfsQoMbcthXAUVx98RHewVcE+ ql8SmJvE6LW8OKvgSxH7mEWBD54rIdkFgA77+/FQOPqccdbvU9nx5hOa9 A==; X-IronPort-AV: E=McAfee;i="6500,9779,10616"; a="314018692" X-IronPort-AV: E=Sophos;i="5.97,286,1669104000"; d="scan'208";a="314018692" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2023 01:05:28 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10616"; a="736669726" X-IronPort-AV: E=Sophos;i="5.97,286,1669104000"; d="scan'208";a="736669726" Received: from hrchavan-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.46.42]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2023 01:05:28 -0800 Subject: [PATCH v2 01/20] cxl/memdev: Fix endpoint port removal From: Dan Williams To: linux-cxl@vger.kernel.org Cc: vishal.l.verma@intel.com, dave.hansen@linux.intel.com, linux-mm@kvack.org, linux-acpi@vger.kernel.org Date: Fri, 10 Feb 2023 01:05:27 -0800 Message-ID: <167601992789.1924368.8083994227892600608.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <167601992097.1924368.18291887895351917895.stgit@dwillia2-xfh.jf.intel.com> References: <167601992097.1924368.18291887895351917895.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org Testing of ram region support [1], stimulates a long standing bug in cxl_detach_ep() where some cxl_ep_remove() cleanup is skipped due to inability to walk ports after dports have been unregistered. That results in a failure to re-register a memdev after the port is re-enabled leading to a crash like the following: cxl_port_setup_targets: cxl region4: cxl_host_bridge.0:port4 iw: 1 ig: 256 general protection fault, ... [..] RIP: 0010:cxl_region_setup_targets+0x897/0x9e0 [cxl_core] dev_name at include/linux/device.h:700 (inlined by) cxl_port_setup_targets at drivers/cxl/core/region.c:1155 (inlined by) cxl_region_setup_targets at drivers/cxl/core/region.c:1249 [..] Call Trace: attach_target+0x39a/0x760 [cxl_core] ? __mutex_unlock_slowpath+0x3a/0x290 cxl_add_to_region+0xb8/0x340 [cxl_core] ? lockdep_hardirqs_on+0x7d/0x100 discover_region+0x4b/0x80 [cxl_port] ? __pfx_discover_region+0x10/0x10 [cxl_port] device_for_each_child+0x58/0x90 cxl_port_probe+0x10e/0x130 [cxl_port] cxl_bus_probe+0x17/0x50 [cxl_core] Change the port ancestry walk to be by depth rather than by dport. This ensures that even if a port has unregistered its dports a deferred memdev cleanup will still be able to cleanup the memdev's interest in that port. The parent_port->dev.driver check is only needed for determining if the bottom up removal beat the top-down removal, but cxl_ep_remove() can always proceed. Fixes: 2703c16c75ae ("cxl/core/port: Add switch port enumeration") Link: http://lore.kernel.org/r/167564534874.847146.5222419648551436750.stgit@dwillia2-xfh.jf.intel.com [1] Signed-off-by: Dan Williams Reviewed-by: Vishal Verma --- drivers/cxl/core/memdev.c | 1 + drivers/cxl/core/port.c | 58 +++++++++++++++++++++++++-------------------- drivers/cxl/cxlmem.h | 2 ++ 3 files changed, 35 insertions(+), 26 deletions(-) diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index a74a93310d26..3a8bc2b06047 100644 --- a/drivers/cxl/core/memdev.c +++ b/drivers/cxl/core/memdev.c @@ -246,6 +246,7 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds, if (rc < 0) goto err; cxlmd->id = rc; + cxlmd->depth = -1; dev = &cxlmd->dev; device_initialize(dev); diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 410c036c09fa..317bcf4dbd9d 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -1207,6 +1207,7 @@ int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint) get_device(&endpoint->dev); dev_set_drvdata(dev, endpoint); + cxlmd->depth = endpoint->depth; return devm_add_action_or_reset(dev, delete_endpoint, cxlmd); } EXPORT_SYMBOL_NS_GPL(cxl_endpoint_autoremove, CXL); @@ -1241,50 +1242,55 @@ static void reap_dports(struct cxl_port *port) } } +struct detach_ctx { + struct cxl_memdev *cxlmd; + int depth; +}; + +static int port_has_memdev(struct device *dev, const void *data) +{ + const struct detach_ctx *ctx = data; + struct cxl_port *port; + + if (!is_cxl_port(dev)) + return 0; + + port = to_cxl_port(dev); + if (port->depth != ctx->depth) + return 0; + + return !!cxl_ep_load(port, ctx->cxlmd); +} + static void cxl_detach_ep(void *data) { struct cxl_memdev *cxlmd = data; - struct device *iter; - for (iter = &cxlmd->dev; iter; iter = grandparent(iter)) { - struct device *dport_dev = grandparent(iter); + for (int i = cxlmd->depth - 1; i >= 1; i--) { struct cxl_port *port, *parent_port; + struct detach_ctx ctx = { + .cxlmd = cxlmd, + .depth = i, + }; + struct device *dev; struct cxl_ep *ep; bool died = false; - if (!dport_dev) - break; - - port = find_cxl_port(dport_dev, NULL); - if (!port) - continue; - - if (is_cxl_root(port)) { - put_device(&port->dev); + dev = bus_find_device(&cxl_bus_type, NULL, &ctx, + port_has_memdev); + if (!dev) continue; - } + port = to_cxl_port(dev); parent_port = to_cxl_port(port->dev.parent); device_lock(&parent_port->dev); - if (!parent_port->dev.driver) { - /* - * The bottom-up race to delete the port lost to a - * top-down port disable, give up here, because the - * parent_port ->remove() will have cleaned up all - * descendants. - */ - device_unlock(&parent_port->dev); - put_device(&port->dev); - continue; - } - device_lock(&port->dev); ep = cxl_ep_load(port, cxlmd); dev_dbg(&cxlmd->dev, "disconnect %s from %s\n", ep ? dev_name(ep->ep) : "", dev_name(&port->dev)); cxl_ep_remove(port, ep); if (ep && !port->dead && xa_empty(&port->endpoints) && - !is_cxl_root(parent_port)) { + !is_cxl_root(parent_port) && parent_port->dev.driver) { /* * This was the last ep attached to a dynamically * enumerated port. Block new cxl_add_ep() and garbage diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index ab138004f644..c9da3c699a21 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -38,6 +38,7 @@ * @cxl_nvb: coordinate removal of @cxl_nvd if present * @cxl_nvd: optional bridge to an nvdimm if the device supports pmem * @id: id number of this memdev instance. + * @depth: endpoint port depth */ struct cxl_memdev { struct device dev; @@ -47,6 +48,7 @@ struct cxl_memdev { struct cxl_nvdimm_bridge *cxl_nvb; struct cxl_nvdimm *cxl_nvd; int id; + int depth; }; static inline struct cxl_memdev *to_cxl_memdev(struct device *dev) From patchwork Fri Feb 10 09:05:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 652414 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B21DC636D3 for ; Fri, 10 Feb 2023 09:06:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231715AbjBJJGl (ORCPT ); Fri, 10 Feb 2023 04:06:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39268 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231708AbjBJJGI (ORCPT ); Fri, 10 Feb 2023 04:06:08 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 308035A906; Fri, 10 Feb 2023 01:05:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1676019941; x=1707555941; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3cQC8uwKOt3b1h6vNcSkmE8AzC7Sg9UhcvwbHDHzsBA=; b=dY/wlWSPmqvnTVPn7s/3qREwEx3YcYYzDVwHE6TNGzIE8v9Cm9WS5Epl F6E40ntp6WfjaQw+FWvh4vJX3tYQTjSUYCZ/Yy1i9+/5gqvFJaJD36j/K bjj85s5GQ3LHx3rtnqrYqx96jhlk+rXu0pCkdm/IOZ/RmMUEARlteshan ofcZbgNAQxp2UIx3ZK2U5rqrYoozTIXtHCoEKPHpvLBcN7hUaTv2aDm3Y VSxpsmEllWcNhnAXW2+IsHpBsJhrUpkO7i8vg/0vqL8fUSYerUVul1wY0 VC7DiWs0l3l/pqryeqcwMMW0aPdu3hv7Lw65eiTQ7GLADX+DPjfHPGswP A==; X-IronPort-AV: E=McAfee;i="6500,9779,10616"; a="416599982" X-IronPort-AV: E=Sophos;i="5.97,286,1669104000"; d="scan'208";a="416599982" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2023 01:05:40 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10616"; a="669930135" X-IronPort-AV: E=Sophos;i="5.97,286,1669104000"; d="scan'208";a="669930135" Received: from hrchavan-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.46.42]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2023 01:05:39 -0800 Subject: [PATCH v2 03/20] cxl/region: Add a mode attribute for regions From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Vishal Verma , Dave Jiang , Gregory Price , Ira Weiny , Jonathan Cameron , Fan Ni , dave.hansen@linux.intel.com, linux-mm@kvack.org, linux-acpi@vger.kernel.org Date: Fri, 10 Feb 2023 01:05:39 -0800 Message-ID: <167601993930.1924368.4305018565539515665.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <167601992097.1924368.18291887895351917895.stgit@dwillia2-xfh.jf.intel.com> References: <167601992097.1924368.18291887895351917895.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org In preparation for a new region type, "ram" regions, add a mode attribute to clarify the mode of the decoders that can be added to a region. Share the internals of mode_show() (for decoders) with the region case. Reviewed-by: Vishal Verma Reviewed-by: Dave Jiang Reviewed-by: Gregory Price Reviewed-by: Ira Weiny Reviewed-by: Jonathan Cameron Tested-by: Fan Ni Link: https://lore.kernel.org/r/167564536041.847146.11330354943211409793.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams --- Documentation/ABI/testing/sysfs-bus-cxl | 11 +++++++++++ drivers/cxl/core/port.c | 12 +----------- drivers/cxl/core/region.c | 10 ++++++++++ drivers/cxl/cxl.h | 14 ++++++++++++++ 4 files changed, 36 insertions(+), 11 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index 5be032313e29..058b0c45001f 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -358,6 +358,17 @@ Description: results in the same address being allocated. +What: /sys/bus/cxl/devices/regionZ/mode +Date: January, 2023 +KernelVersion: v6.3 +Contact: linux-cxl@vger.kernel.org +Description: + (RO) The mode of a region is established at region creation time + and dictates the mode of the endpoint decoder that comprise the + region. For more details on the possible modes see + /sys/bus/cxl/devices/decoderX.Y/mode + + What: /sys/bus/cxl/devices/regionZ/resource Date: May, 2022 KernelVersion: v6.0 diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 317bcf4dbd9d..1e541956f605 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -180,17 +180,7 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr, { struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev); - switch (cxled->mode) { - case CXL_DECODER_RAM: - return sysfs_emit(buf, "ram\n"); - case CXL_DECODER_PMEM: - return sysfs_emit(buf, "pmem\n"); - case CXL_DECODER_NONE: - return sysfs_emit(buf, "none\n"); - case CXL_DECODER_MIXED: - default: - return sysfs_emit(buf, "mixed\n"); - } + return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxled->mode)); } static ssize_t mode_store(struct device *dev, struct device_attribute *attr, diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 60828d01972a..17d2d0c12725 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -458,6 +458,15 @@ static ssize_t resource_show(struct device *dev, struct device_attribute *attr, } static DEVICE_ATTR_RO(resource); +static ssize_t mode_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct cxl_region *cxlr = to_cxl_region(dev); + + return sysfs_emit(buf, "%s\n", cxl_decoder_mode_name(cxlr->mode)); +} +static DEVICE_ATTR_RO(mode); + static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size) { struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); @@ -585,6 +594,7 @@ static struct attribute *cxl_region_attrs[] = { &dev_attr_interleave_granularity.attr, &dev_attr_resource.attr, &dev_attr_size.attr, + &dev_attr_mode.attr, NULL, }; diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index aa3af3bb73b2..ca76879af1de 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -320,6 +320,20 @@ enum cxl_decoder_mode { CXL_DECODER_DEAD, }; +static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode) +{ + static const char * const names[] = { + [CXL_DECODER_NONE] = "none", + [CXL_DECODER_RAM] = "ram", + [CXL_DECODER_PMEM] = "pmem", + [CXL_DECODER_MIXED] = "mixed", + }; + + if (mode >= CXL_DECODER_NONE && mode <= CXL_DECODER_MIXED) + return names[mode]; + return "mixed"; +} + /** * struct cxl_endpoint_decoder - Endpoint / SPA to DPA decoder * @cxld: base cxl_decoder_object From patchwork Fri Feb 10 09:05:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 652413 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88302C05027 for ; Fri, 10 Feb 2023 09:06:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231750AbjBJJGx (ORCPT ); Fri, 10 Feb 2023 04:06:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39072 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231409AbjBJJGN (ORCPT ); Fri, 10 Feb 2023 04:06:13 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 718A937713; Fri, 10 Feb 2023 01:05:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1676019953; x=1707555953; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=pOZoUMpBUKRLc8QIK289XEjoJsII4YkEt62Y5Arb8us=; b=ayHpBr5/S6SDefaGkDCl7ygH5IL2FZiTFvGE10pqlIFkSkwkhO1yTxIT Djues1HUKDGuS4yUcY2a/c2YKKMQo9FAdgqODujZkzm9Jkrx1UOzYsdsK Aw8yygItcbcW4rQx9Iv1ZfejyYcZTjmh3u3SIS2IbM3lhUsHtNW+2ZArA mXwT4qGfqiyFDBi75NgPWBSZ7FiP9f+tGw+/9/eNghQeKeyVlrHjJn5e3 ZBvJwTfA6fwGiN7nNmdVmHfjFno7GaIc6vbhBbhdX9R6qvfGLbifUh99Y 96veemMbkvk66OfAN+62RsVVUClh9JynZG4dj2WJ79kVG5SygvNvQV1Rs g==; X-IronPort-AV: E=McAfee;i="6500,9779,10616"; a="416600012" X-IronPort-AV: E=Sophos;i="5.97,286,1669104000"; d="scan'208";a="416600012" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2023 01:05:53 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10616"; a="669930208" X-IronPort-AV: E=Sophos;i="5.97,286,1669104000"; d="scan'208";a="669930208" Received: from hrchavan-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.46.42]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2023 01:05:51 -0800 Subject: [PATCH v2 05/20] cxl/region: Validate region mode vs decoder mode From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Vishal Verma , Gregory Price , Dave Jiang , Ira Weiny , Jonathan Cameron , Fan Ni , dave.hansen@linux.intel.com, linux-mm@kvack.org, linux-acpi@vger.kernel.org Date: Fri, 10 Feb 2023 01:05:51 -0800 Message-ID: <167601995111.1924368.7459128614177994602.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <167601992097.1924368.18291887895351917895.stgit@dwillia2-xfh.jf.intel.com> References: <167601992097.1924368.18291887895351917895.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org In preparation for a new region mode, do not, for example, allow 'ram' decoders to be assigned to 'pmem' regions and vice versa. Reviewed-by: Vishal Verma Reviewed-by: Gregory Price Reviewed-by: Dave Jiang Reviewed-by: Ira Weiny Reviewed-by: Jonathan Cameron Tested-by: Fan Ni Link: https://lore.kernel.org/r/167564537131.847146.9020072654741860107.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams --- drivers/cxl/core/region.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 0fc80478ff6b..285835145e9b 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1221,6 +1221,12 @@ static int cxl_region_attach(struct cxl_region *cxlr, struct cxl_dport *dport; int i, rc = -ENXIO; + if (cxled->mode != cxlr->mode) { + dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n", + dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode); + return -EINVAL; + } + if (cxled->mode == CXL_DECODER_DEAD) { dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev)); return -ENODEV; From patchwork Fri Feb 10 09:06:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 652412 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06FA0C6379F for ; Fri, 10 Feb 2023 09:07:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231843AbjBJJHX (ORCPT ); Fri, 10 Feb 2023 04:07:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39230 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231500AbjBJJGe (ORCPT ); Fri, 10 Feb 2023 04:06:34 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0091C2313F; Fri, 10 Feb 2023 01:06:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1676019974; x=1707555974; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=s+JhIpX+g9sjvv5ZiADOIwQseU67ppQ0dtunIobOaPU=; b=FoSVZzCpEuK93BjViEPsgOsWDT/V0KFsVcwyCVR6CBKAMDaTxb+C/h7a gZ0S78N3FUi59hasDyfFVACMrsfnkwDsh73jTTmKZRgmD9Dkjx4OHExop gSR57dlpmyHSy2BOcUz/NGxK7emo4RrAKaUNttBkIv+XJ2dxs78lWMmHG f05pycl7xkdf4l5ldFoxBq9TKIy5NZVffpioCmLywXS2U7C4J66+OCVUD 7eW0paqHmG2DX76EckYUVhfICaB8+wDNcAY/Ac8AKmx5ND3U8MOxB10BS 8v40f6XJ2NiL5f3PkuHxIkVHO6k521ZY4OG/GNbouBHuxWUVbhAnp4AIp w==; X-IronPort-AV: E=McAfee;i="6500,9779,10616"; a="416600040" X-IronPort-AV: E=Sophos;i="5.97,286,1669104000"; d="scan'208";a="416600040" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2023 01:06:04 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10616"; a="669930234" X-IronPort-AV: E=Sophos;i="5.97,286,1669104000"; d="scan'208";a="669930234" Received: from hrchavan-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.46.42]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2023 01:06:04 -0800 Subject: [PATCH v2 07/20] cxl/region: Refactor attach_target() for autodiscovery From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Vishal Verma , Dave Jiang , Ira Weiny , Jonathan Cameron , Fan Ni , dave.hansen@linux.intel.com, linux-mm@kvack.org, linux-acpi@vger.kernel.org Date: Fri, 10 Feb 2023 01:06:04 -0800 Message-ID: <167601996393.1924368.2202255054618600069.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <167601992097.1924368.18291887895351917895.stgit@dwillia2-xfh.jf.intel.com> References: <167601992097.1924368.18291887895351917895.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org Region autodiscovery is the process of kernel creating 'struct cxl_region' object to represent active CXL memory ranges it finds already active in hardware when the driver loads. Typically this happens when platform firmware establishes CXL memory regions and then publishes them in the memory map. However, this can also happen in the case of kexec-reboot after the kernel has created regions. In the autodiscovery case the region creation process starts with a known endpoint decoder. Refactor attach_target() into a helper that is suitable to be called from either sysfs, for runtime region creation, or from cxl_port_probe() after it has enumerated all endpoint decoders. The cxl_port_probe() context is an async device-core probing context, so it is not appropriate to allow SIGTERM to interrupt the assembly process. Refactor attach_target() to take @cxled and @state as arguments where @state indicates whether waiting from the region rwsem is interruptible or not. No behavior change is intended. Reviewed-by: Vishal Verma Reviewed-by: Dave Jiang Reviewed-by: Ira Weiny Reviewed-by: Jonathan Cameron Tested-by: Fan Ni Link: https://lore.kernel.org/r/167564538227.847146.16305045998592488364.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams --- drivers/cxl/core/region.c | 47 +++++++++++++++++++++++++++------------------ 1 file changed, 28 insertions(+), 19 deletions(-) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index e440db8611a4..040bbd39c81d 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1422,31 +1422,25 @@ void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled) up_write(&cxl_region_rwsem); } -static int attach_target(struct cxl_region *cxlr, const char *decoder, int pos) +static int attach_target(struct cxl_region *cxlr, + struct cxl_endpoint_decoder *cxled, int pos, + unsigned int state) { - struct device *dev; - int rc; - - dev = bus_find_device_by_name(&cxl_bus_type, NULL, decoder); - if (!dev) - return -ENODEV; - - if (!is_endpoint_decoder(dev)) { - put_device(dev); - return -EINVAL; - } + int rc = 0; - rc = down_write_killable(&cxl_region_rwsem); + if (state == TASK_INTERRUPTIBLE) + rc = down_write_killable(&cxl_region_rwsem); + else + down_write(&cxl_region_rwsem); if (rc) - goto out; + return rc; + down_read(&cxl_dpa_rwsem); - rc = cxl_region_attach(cxlr, to_cxl_endpoint_decoder(dev), pos); + rc = cxl_region_attach(cxlr, cxled, pos); if (rc == 0) set_bit(CXL_REGION_F_INCOHERENT, &cxlr->flags); up_read(&cxl_dpa_rwsem); up_write(&cxl_region_rwsem); -out: - put_device(dev); return rc; } @@ -1484,8 +1478,23 @@ static size_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos, if (sysfs_streq(buf, "\n")) rc = detach_target(cxlr, pos); - else - rc = attach_target(cxlr, buf, pos); + else { + struct device *dev; + + dev = bus_find_device_by_name(&cxl_bus_type, NULL, buf); + if (!dev) + return -ENODEV; + + if (!is_endpoint_decoder(dev)) { + rc = -EINVAL; + goto out; + } + + rc = attach_target(cxlr, to_cxl_endpoint_decoder(dev), pos, + TASK_INTERRUPTIBLE); +out: + put_device(dev); + } if (rc < 0) return rc; From patchwork Fri Feb 10 09:06:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 652411 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B63F2C6379F for ; Fri, 10 Feb 2023 09:07:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231701AbjBJJHc (ORCPT ); Fri, 10 Feb 2023 04:07:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39820 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231476AbjBJJGj (ORCPT ); Fri, 10 Feb 2023 04:06:39 -0500 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 26A5F2313C; Fri, 10 Feb 2023 01:06:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1676019977; x=1707555977; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tyAA/0gIK0IYL4MQ6rj9f+aiydACvqCY4T+pqHWSbEY=; b=bKTN36axc+rzMWvS/i4F0FV/dF+4f2EdC0pT+uxCXFGpEXqbDFCSOKwI B6dZxVHaHIL/poGlDc0bif1Bw0vykKcIOU+3G4PvQ47KNVnMutsXqQZO0 IZlVPzT5G+Ex2eg3+pSGF1JV2XHPPtJ4Lx5QCPcYQXediwK8JpBNX79rw Mcae5TGNS+7E53StpzOAkB1vGwtjwd1dOQ/lJGCwC7uDz1DoIjaUQXRdj zmsYog7725RSKassZHzRdIrhsll2CmdD4YGMAPEXeyx6A12iJbBFwS3N+ 1OC+vnHT1Fvk/47e8jSiBQH7n5xoy3NtQf1dD9lk37pnmAqeUXWg8ubDd w==; X-IronPort-AV: E=McAfee;i="6500,9779,10616"; a="329003041" X-IronPort-AV: E=Sophos;i="5.97,286,1669104000"; d="scan'208";a="329003041" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2023 01:06:16 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10616"; a="913463794" X-IronPort-AV: E=Sophos;i="5.97,286,1669104000"; d="scan'208";a="913463794" Received: from hrchavan-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.46.42]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2023 01:06:16 -0800 Subject: [PATCH v2 09/20] cxl/region: Move region-position validation to a helper From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Vishal Verma , Fan Ni , dave.hansen@linux.intel.com, linux-mm@kvack.org, linux-acpi@vger.kernel.org Date: Fri, 10 Feb 2023 01:06:15 -0800 Message-ID: <167601997584.1924368.4615769326126138969.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <167601992097.1924368.18291887895351917895.stgit@dwillia2-xfh.jf.intel.com> References: <167601992097.1924368.18291887895351917895.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org In preparation for region autodiscovery, that needs all devices discovered before their relative position in the region can be determined, consolidate all position dependent validation in a helper. Recall that in the on-demand region creation flow the end-user picks the position of a given endpoint decoder in a region. In the autodiscovery case the position of an endpoint decoder can only be determined after all other endpoint decoders that claim to decode the region's address range have been enumerated and attached. So, in the autodiscovery case endpoint decoders may be attached before their relative position is known. Once all decoders arrive, then positions can be determined and validated with cxl_region_validate_position() the same as user initiated on-demand creation. Reviewed-by: Vishal Verma Tested-by: Fan Ni Link: https://lore.kernel.org/r/167564538779.847146.8356062886811511706.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams --- drivers/cxl/core/region.c | 119 +++++++++++++++++++++++++++++---------------- 1 file changed, 76 insertions(+), 43 deletions(-) diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index ae7d3adcd41a..691605f1e120 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1211,35 +1211,13 @@ static int cxl_region_setup_targets(struct cxl_region *cxlr) return 0; } -static int cxl_region_attach(struct cxl_region *cxlr, - struct cxl_endpoint_decoder *cxled, int pos) +static int cxl_region_validate_position(struct cxl_region *cxlr, + struct cxl_endpoint_decoder *cxled, + int pos) { - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); - struct cxl_port *ep_port, *root_port, *iter; struct cxl_region_params *p = &cxlr->params; - struct cxl_dport *dport; - int i, rc = -ENXIO; - - if (cxled->mode != cxlr->mode) { - dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n", - dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode); - return -EINVAL; - } - - if (cxled->mode == CXL_DECODER_DEAD) { - dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev)); - return -ENODEV; - } - - /* all full of members, or interleave config not established? */ - if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) { - dev_dbg(&cxlr->dev, "region already active\n"); - return -EBUSY; - } else if (p->state < CXL_CONFIG_INTERLEAVE_ACTIVE) { - dev_dbg(&cxlr->dev, "interleave config missing\n"); - return -ENXIO; - } + int i; if (pos < 0 || pos >= p->interleave_ways) { dev_dbg(&cxlr->dev, "position %d out of range %d\n", pos, @@ -1278,6 +1256,71 @@ static int cxl_region_attach(struct cxl_region *cxlr, } } + return 0; +} + +static int cxl_region_attach_position(struct cxl_region *cxlr, + struct cxl_root_decoder *cxlrd, + struct cxl_endpoint_decoder *cxled, + const struct cxl_dport *dport, int pos) +{ + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_port *iter; + int rc; + + if (cxlrd->calc_hb(cxlrd, pos) != dport) { + dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n", + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), + dev_name(&cxlrd->cxlsd.cxld.dev)); + return -ENXIO; + } + + for (iter = cxled_to_port(cxled); !is_cxl_root(iter); + iter = to_cxl_port(iter->dev.parent)) { + rc = cxl_port_attach_region(iter, cxlr, cxled, pos); + if (rc) + goto err; + } + + return 0; + +err: + for (iter = cxled_to_port(cxled); !is_cxl_root(iter); + iter = to_cxl_port(iter->dev.parent)) + cxl_port_detach_region(iter, cxlr, cxled); + return rc; +} + +static int cxl_region_attach(struct cxl_region *cxlr, + struct cxl_endpoint_decoder *cxled, int pos) +{ + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_region_params *p = &cxlr->params; + struct cxl_port *ep_port, *root_port; + struct cxl_dport *dport; + int rc = -ENXIO; + + if (cxled->mode != cxlr->mode) { + dev_dbg(&cxlr->dev, "%s region mode: %d mismatch: %d\n", + dev_name(&cxled->cxld.dev), cxlr->mode, cxled->mode); + return -EINVAL; + } + + if (cxled->mode == CXL_DECODER_DEAD) { + dev_dbg(&cxlr->dev, "%s dead\n", dev_name(&cxled->cxld.dev)); + return -ENODEV; + } + + /* all full of members, or interleave config not established? */ + if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) { + dev_dbg(&cxlr->dev, "region already active\n"); + return -EBUSY; + } else if (p->state < CXL_CONFIG_INTERLEAVE_ACTIVE) { + dev_dbg(&cxlr->dev, "interleave config missing\n"); + return -ENXIO; + } + ep_port = cxled_to_port(cxled); root_port = cxlrd_to_port(cxlrd); dport = cxl_find_dport_by_dev(root_port, ep_port->host_bridge); @@ -1288,13 +1331,6 @@ static int cxl_region_attach(struct cxl_region *cxlr, return -ENXIO; } - if (cxlrd->calc_hb(cxlrd, pos) != dport) { - dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n", - dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), - dev_name(&cxlrd->cxlsd.cxld.dev)); - return -ENXIO; - } - if (cxled->cxld.target_type != cxlr->type) { dev_dbg(&cxlr->dev, "%s:%s type mismatch: %d vs %d\n", dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), @@ -1318,12 +1354,13 @@ static int cxl_region_attach(struct cxl_region *cxlr, return -EINVAL; } - for (iter = ep_port; !is_cxl_root(iter); - iter = to_cxl_port(iter->dev.parent)) { - rc = cxl_port_attach_region(iter, cxlr, cxled, pos); - if (rc) - goto err; - } + rc = cxl_region_validate_position(cxlr, cxled, pos); + if (rc) + return rc; + + rc = cxl_region_attach_position(cxlr, cxlrd, cxled, dport, pos); + if (rc) + return rc; p->targets[pos] = cxled; cxled->pos = pos; @@ -1349,10 +1386,6 @@ static int cxl_region_attach(struct cxl_region *cxlr, p->nr_targets--; cxled->pos = -1; p->targets[pos] = NULL; -err: - for (iter = ep_port; !is_cxl_root(iter); - iter = to_cxl_port(iter->dev.parent)) - cxl_port_detach_region(iter, cxlr, cxled); return rc; } From patchwork Fri Feb 10 09:06:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 652410 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8540FC64EC4 for ; Fri, 10 Feb 2023 09:07:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231923AbjBJJHk (ORCPT ); Fri, 10 Feb 2023 04:07:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231629AbjBJJGu (ORCPT ); Fri, 10 Feb 2023 04:06:50 -0500 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EBBF33EFC5; Fri, 10 Feb 2023 01:06:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1676019989; x=1707555989; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QvyAdNBc9ZYsmtnrtNpMiAkkV624GUaybC07KzHXR5s=; b=ROTz6GR1lLgZ/UGNGI1F5F3KZIcEtNPAQipPFHlhx+l2ym3IZUOe/rka v21sG4kvO5KJ6PDn0qhC+au9KGGeqStppK03cg+DOkqlo7OTzC23Vd/4l e/58E5xg52nG0nCAQsivYn7qf4PEvVKpknIvheTQxL+zBJgNxcY4BzCNt KcaQLayHLJ2Ae0C6nFsesny21iqnTPUSD6urzLmSNvDFI1EZk72+GXZYC HhKB9Osht0Bnt5LepzMoIky4nYHZxlAd2S0LgM/ZQz9FsAhAn+iuPN6wh C/fZ+1BzpAFQVsG1ZllOdhtiKb+Vd8fsH4hyC4h7oJI0q85wGy3/BvJ6/ w==; X-IronPort-AV: E=McAfee;i="6500,9779,10616"; a="329003076" X-IronPort-AV: E=Sophos;i="5.97,286,1669104000"; d="scan'208";a="329003076" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2023 01:06:28 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10616"; a="913463849" X-IronPort-AV: E=Sophos;i="5.97,286,1669104000"; d="scan'208";a="913463849" Received: from hrchavan-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.46.42]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2023 01:06:28 -0800 Subject: [PATCH v2 11/20] cxl/region: Enable CONFIG_CXL_REGION to be toggled From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Vishal Verma , Jonathan Cameron , Dave Jiang , Gregory Price , Fan Ni , dave.hansen@linux.intel.com, linux-mm@kvack.org, linux-acpi@vger.kernel.org Date: Fri, 10 Feb 2023 01:06:27 -0800 Message-ID: <167601998765.1924368.258370414771847699.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <167601992097.1924368.18291887895351917895.stgit@dwillia2-xfh.jf.intel.com> References: <167601992097.1924368.18291887895351917895.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org Add help text and a label so the CXL_REGION config option can be toggled. This is mainly to enable compile testing without region support. Reviewed-by: Vishal Verma Reviewed-by: Jonathan Cameron Reviewed-by: Dave Jiang Reviewed-by: Gregory Price Tested-by: Fan Ni Link: https://lore.kernel.org/r/167564539875.847146.16213498614174558767.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams --- drivers/cxl/Kconfig | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig index 0ac53c422c31..163c094e67ae 100644 --- a/drivers/cxl/Kconfig +++ b/drivers/cxl/Kconfig @@ -104,12 +104,22 @@ config CXL_SUSPEND depends on SUSPEND && CXL_MEM config CXL_REGION - bool + bool "CXL: Region Support" default CXL_BUS # For MAX_PHYSMEM_BITS depends on SPARSEMEM select MEMREGION select GET_FREE_REGION + help + Enable the CXL core to enumerate and provision CXL regions. A CXL + region is defined by one or more CXL expanders that decode a given + system-physical address range. For CXL regions established by + platform-firmware this option enables memory error handling to + identify the devices participating in a given interleaved memory + range. Otherwise, platform-firmware managed CXL is enabled by being + placed in the system address map and does not need a driver. + + If unsure say 'y' config CXL_REGION_INVALIDATION_TEST bool "CXL: Region Cache Management Bypass (TEST)" From patchwork Fri Feb 10 09:06:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 652409 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E8D6C636D3 for ; Fri, 10 Feb 2023 09:07:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231720AbjBJJHq (ORCPT ); Fri, 10 Feb 2023 04:07:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231737AbjBJJHA (ORCPT ); Fri, 10 Feb 2023 04:07:00 -0500 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ABF84BB9A; Fri, 10 Feb 2023 01:06:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1676020000; x=1707556000; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xg87SwclHTz/KBBQ2o6kd8aRFgOzWf7Clc07w0ozOXQ=; b=W32/MnAvCX9l6dwMwhZWBSPTkKacNetL9nYsnQzBzBtQssAn2h2lZsxe kFbLzQNf8TZo0x4CTB27IhAxpn07IIPpRKyRVb63CphZNaXcvFMAGG9kl s9OlY90WrCH+T8XnTUVL4pQkXdsHlLILRdMNtaYZJ8F+CoZU5G9NyOr9A WZRSe5OjQ8GtbQHg0KU9pxkyXgKwAcITYbvDUbpV08rFlIShx0LVWb0Es 3aLa6ygt3eMxS6vw/fPxKJIiUaZnkNJE8KgUqGztB2KSHnWXv7snQ3ePp pscz884uaJ1VFmU+Le89RuGd2ewvzlaCyUYXJgPRARCbpx1SoSRjhkaoD A==; X-IronPort-AV: E=McAfee;i="6500,9779,10616"; a="329003112" X-IronPort-AV: E=Sophos;i="5.97,286,1669104000"; d="scan'208";a="329003112" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2023 01:06:40 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10616"; a="913463913" X-IronPort-AV: E=Sophos;i="5.97,286,1669104000"; d="scan'208";a="913463913" Received: from hrchavan-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.46.42]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2023 01:06:40 -0800 Subject: [PATCH v2 13/20] cxl/region: Add region autodiscovery From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Fan Ni , vishal.l.verma@intel.com, dave.hansen@linux.intel.com, linux-mm@kvack.org, linux-acpi@vger.kernel.org Date: Fri, 10 Feb 2023 01:06:39 -0800 Message-ID: <167601999958.1924368.9366954455835735048.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <167601992097.1924368.18291887895351917895.stgit@dwillia2-xfh.jf.intel.com> References: <167601992097.1924368.18291887895351917895.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org Region autodiscovery is an asynchronous state machine advanced by cxl_port_probe(). After the decoders on an endpoint port are enumerated they are scanned for actively enabled instances. Each active decoder is flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region. If a region does not already exist for the address range setting of the decoder one is created. That creation process may race with other decoders of the same region being discovered since cxl_port_probe() is asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is introduced to mitigate that race. Once all decoders have arrived, "p->nr_targets == p->interleave_ways", they are sorted by their relative decode position. The sort algorithm involves finding the point in the cxl_port topology where one leg of the decode leads to deviceA and the other deviceB. At that point in the topology the target order in the 'struct cxl_switch_decoder' indicates the relative position of those endpoint decoders in the region. >From that point the region goes through the same setup and validation steps as user-created regions, but instead of programming the decoders it validates that driver would have written the same values to the decoders as were already present. Tested-by: Fan Ni Link: https://lore.kernel.org/r/167564540972.847146.17096178433176097831.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams --- drivers/cxl/core/hdm.c | 11 + drivers/cxl/core/port.c | 2 drivers/cxl/core/region.c | 497 ++++++++++++++++++++++++++++++++++++++++++++- drivers/cxl/cxl.h | 29 +++ drivers/cxl/port.c | 48 ++++ 5 files changed, 576 insertions(+), 11 deletions(-) diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index a0891c3464f1..8c29026a4b9d 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -676,6 +676,14 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld) port->commit_end--; cxld->flags &= ~CXL_DECODER_F_ENABLE; + /* Userspace is now responsible for reconfiguring this decoder */ + if (is_endpoint_decoder(&cxld->dev)) { + struct cxl_endpoint_decoder *cxled; + + cxled = to_cxl_endpoint_decoder(&cxld->dev); + cxled->state = CXL_DECODER_STATE_MANUAL; + } + return 0; } @@ -783,6 +791,9 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld, return rc; } *dpa_base += dpa_size + skip; + + cxled->state = CXL_DECODER_STATE_AUTO; + return 0; } diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 9e5df64ea6b5..59620528571a 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -446,6 +446,7 @@ bool is_endpoint_decoder(struct device *dev) { return dev->type == &cxl_decoder_endpoint_type; } +EXPORT_SYMBOL_NS_GPL(is_endpoint_decoder, CXL); bool is_root_decoder(struct device *dev) { @@ -1628,6 +1629,7 @@ struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port, } cxlrd->calc_hb = calc_hb; + mutex_init(&cxlrd->range_lock); cxld = &cxlsd->cxld; cxld->dev.type = &cxl_decoder_root_type; diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 691605f1e120..3f6453da2c51 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -524,7 +525,12 @@ static void cxl_region_iomem_release(struct cxl_region *cxlr) if (device_is_registered(&cxlr->dev)) lockdep_assert_held_write(&cxl_region_rwsem); if (p->res) { - remove_resource(p->res); + /* + * Autodiscovered regions may not have been able to insert their + * resource. + */ + if (p->res->parent) + remove_resource(p->res); kfree(p->res); p->res = NULL; } @@ -1105,12 +1111,35 @@ static int cxl_port_setup_targets(struct cxl_port *port, return rc; } - cxld->interleave_ways = iw; - cxld->interleave_granularity = ig; - cxld->hpa_range = (struct range) { - .start = p->res->start, - .end = p->res->end, - }; + if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) { + if (cxld->interleave_ways != iw || + cxld->interleave_granularity != ig || + cxld->hpa_range.start != p->res->start || + cxld->hpa_range.end != p->res->end || + ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) { + dev_err(&cxlr->dev, + "%s:%s %s expected iw: %d ig: %d %pr\n", + dev_name(port->uport), dev_name(&port->dev), + __func__, iw, ig, p->res); + dev_err(&cxlr->dev, + "%s:%s %s got iw: %d ig: %d state: %s %#llx:%#llx\n", + dev_name(port->uport), dev_name(&port->dev), + __func__, cxld->interleave_ways, + cxld->interleave_granularity, + (cxld->flags & CXL_DECODER_F_ENABLE) ? + "enabled" : + "disabled", + cxld->hpa_range.start, cxld->hpa_range.end); + return -ENXIO; + } + } else { + cxld->interleave_ways = iw; + cxld->interleave_granularity = ig; + cxld->hpa_range = (struct range) { + .start = p->res->start, + .end = p->res->end, + }; + } dev_dbg(&cxlr->dev, "%s:%s iw: %d ig: %d\n", dev_name(port->uport), dev_name(&port->dev), iw, ig); add_target: @@ -1121,7 +1150,17 @@ static int cxl_port_setup_targets(struct cxl_port *port, dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), pos); return -ENXIO; } - cxlsd->target[cxl_rr->nr_targets_set] = ep->dport; + if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) { + if (cxlsd->target[cxl_rr->nr_targets_set] != ep->dport) { + dev_dbg(&cxlr->dev, "%s:%s: %s expected %s at %d\n", + dev_name(port->uport), dev_name(&port->dev), + dev_name(&cxlsd->cxld.dev), + dev_name(ep->dport->dport), + cxl_rr->nr_targets_set); + return -ENXIO; + } + } else + cxlsd->target[cxl_rr->nr_targets_set] = ep->dport; inc = 1; out_target_set: cxl_rr->nr_targets_set += inc; @@ -1163,6 +1202,13 @@ static void cxl_region_teardown_targets(struct cxl_region *cxlr) struct cxl_ep *ep; int i; + /* + * In the auto-discovery case skip automatic teardown since the + * address space is already active + */ + if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) + return; + for (i = 0; i < p->nr_targets; i++) { cxled = p->targets[i]; cxlmd = cxled_to_memdev(cxled); @@ -1195,8 +1241,8 @@ static int cxl_region_setup_targets(struct cxl_region *cxlr) iter = to_cxl_port(iter->dev.parent); /* - * Descend the topology tree programming targets while - * looking for conflicts. + * Descend the topology tree programming / validating + * targets while looking for conflicts. */ for (ep = cxl_ep_load(iter, cxlmd); iter; iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) { @@ -1291,6 +1337,185 @@ static int cxl_region_attach_position(struct cxl_region *cxlr, return rc; } +static int cxl_region_attach_auto(struct cxl_region *cxlr, + struct cxl_endpoint_decoder *cxled, int pos) +{ + struct cxl_region_params *p = &cxlr->params; + + if (cxled->state != CXL_DECODER_STATE_AUTO) { + dev_err(&cxlr->dev, + "%s: unable to add decoder to autodetected region\n", + dev_name(&cxled->cxld.dev)); + return -EINVAL; + } + + if (pos >= 0) { + dev_dbg(&cxlr->dev, "%s: expected auto position, not %d\n", + dev_name(&cxled->cxld.dev), pos); + return -EINVAL; + } + + if (p->nr_targets >= p->interleave_ways) { + dev_err(&cxlr->dev, "%s: no more target slots available\n", + dev_name(&cxled->cxld.dev)); + return -ENXIO; + } + + /* + * Temporarily record the endpoint decoder into the target array. Yes, + * this means that userspace can view devices in the wrong position + * before the region activates, and must be careful to understand when + * it might be racing region autodiscovery. + */ + pos = p->nr_targets; + p->targets[pos] = cxled; + cxled->pos = pos; + p->nr_targets++; + + return 0; +} + +static struct cxl_port *next_port(struct cxl_port *port) +{ + if (!port->parent_dport) + return NULL; + return port->parent_dport->port; +} + +static int decoder_match_range(struct device *dev, void *data) +{ + struct cxl_endpoint_decoder *cxled = data; + struct cxl_switch_decoder *cxlsd; + + if (!is_switch_decoder(dev)) + return 0; + + cxlsd = to_cxl_switch_decoder(dev); + return range_contains(&cxlsd->cxld.hpa_range, &cxled->cxld.hpa_range); +} + +static void find_positions(const struct cxl_switch_decoder *cxlsd, + const struct cxl_port *iter_a, + const struct cxl_port *iter_b, int *a_pos, + int *b_pos) +{ + int i; + + for (i = 0, *a_pos = -1, *b_pos = -1; i < cxlsd->nr_targets; i++) { + if (cxlsd->target[i] == iter_a->parent_dport) + *a_pos = i; + else if (cxlsd->target[i] == iter_b->parent_dport) + *b_pos = i; + if (*a_pos >= 0 && *b_pos >= 0) + break; + } +} + +static int cmp_decode_pos(const void *a, const void *b) +{ + struct cxl_endpoint_decoder *cxled_a = *(typeof(cxled_a) *)a; + struct cxl_endpoint_decoder *cxled_b = *(typeof(cxled_b) *)b; + struct cxl_memdev *cxlmd_a = cxled_to_memdev(cxled_a); + struct cxl_memdev *cxlmd_b = cxled_to_memdev(cxled_b); + struct cxl_port *port_a = cxled_to_port(cxled_a); + struct cxl_port *port_b = cxled_to_port(cxled_b); + struct cxl_port *iter_a, *iter_b, *port = NULL; + struct cxl_switch_decoder *cxlsd; + struct device *dev; + int a_pos, b_pos; + unsigned int seq; + + /* Exit early if any prior sorting failed */ + if (cxled_a->pos < 0 || cxled_b->pos < 0) + return 0; + + /* + * Walk up the hierarchy to find a shared port, find the decoder that + * maps the range, compare the relative position of those dport + * mappings. + */ + for (iter_a = port_a; iter_a; iter_a = next_port(iter_a)) { + struct cxl_port *next_a, *next_b; + + next_a = next_port(iter_a); + if (!next_a) + break; + + for (iter_b = port_b; iter_b; iter_b = next_port(iter_b)) { + next_b = next_port(iter_b); + if (next_a != next_b) + continue; + port = next_a; + break; + } + + if (port) + break; + } + + if (!port) { + dev_err(cxlmd_a->dev.parent, + "failed to find shared port with %s\n", + dev_name(cxlmd_b->dev.parent)); + goto err; + } + + dev = device_find_child(&port->dev, cxled_a, decoder_match_range); + if (!dev) { + struct range *range = &cxled_a->cxld.hpa_range; + + dev_err(port->uport, + "failed to find decoder that maps %#llx-%#llx\n", + range->start, range->end); + goto err; + } + + cxlsd = to_cxl_switch_decoder(dev); + do { + seq = read_seqbegin(&cxlsd->target_lock); + find_positions(cxlsd, iter_a, iter_b, &a_pos, &b_pos); + } while (read_seqretry(&cxlsd->target_lock, seq)); + + put_device(dev); + + if (a_pos < 0 || b_pos < 0) { + dev_err(port->uport, + "failed to find shared decoder for %s and %s\n", + dev_name(cxlmd_a->dev.parent), + dev_name(cxlmd_b->dev.parent)); + goto err; + } + + dev_dbg(port->uport, "%s comes %s %s\n", dev_name(cxlmd_a->dev.parent), + a_pos - b_pos < 0 ? "before" : "after", + dev_name(cxlmd_b->dev.parent)); + + return a_pos - b_pos; +err: + cxled_a->pos = -1; + return 0; +} + +static int cxl_region_sort_targets(struct cxl_region *cxlr) +{ + struct cxl_region_params *p = &cxlr->params; + int i, rc = 0; + + sort(p->targets, p->nr_targets, sizeof(p->targets[0]), cmp_decode_pos, + NULL); + + for (i = 0; i < p->nr_targets; i++) { + struct cxl_endpoint_decoder *cxled = p->targets[i]; + + if (cxled->pos < 0) + rc = -ENXIO; + cxled->pos = i; + } + + dev_dbg(&cxlr->dev, "region sort %s\n", rc ? "failed" : "successful"); + return rc; +} + static int cxl_region_attach(struct cxl_region *cxlr, struct cxl_endpoint_decoder *cxled, int pos) { @@ -1354,6 +1579,50 @@ static int cxl_region_attach(struct cxl_region *cxlr, return -EINVAL; } + if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) { + int i; + + rc = cxl_region_attach_auto(cxlr, cxled, pos); + if (rc) + return rc; + + /* await more targets to arrive... */ + if (p->nr_targets < p->interleave_ways) + return 0; + + /* + * All targets are here, which implies all PCI enumeration that + * affects this region has been completed. Walk the topology to + * sort the devices into their relative region decode position. + */ + rc = cxl_region_sort_targets(cxlr); + if (rc) + return rc; + + for (i = 0; i < p->nr_targets; i++) { + cxled = p->targets[i]; + ep_port = cxled_to_port(cxled); + dport = cxl_find_dport_by_dev(root_port, + ep_port->host_bridge); + rc = cxl_region_attach_position(cxlr, cxlrd, cxled, + dport, i); + if (rc) + return rc; + } + + rc = cxl_region_setup_targets(cxlr); + if (rc) + return rc; + + /* + * If target setup succeeds in the autodiscovery case + * then the region is already committed. + */ + p->state = CXL_CONFIG_COMMIT; + + return 0; + } + rc = cxl_region_validate_position(cxlr, cxled, pos); if (rc) return rc; @@ -2087,6 +2356,193 @@ static int devm_cxl_add_pmem_region(struct cxl_region *cxlr) return rc; } +static int match_decoder_by_range(struct device *dev, void *data) +{ + struct range *r1, *r2 = data; + struct cxl_root_decoder *cxlrd; + + if (!is_root_decoder(dev)) + return 0; + + cxlrd = to_cxl_root_decoder(dev); + r1 = &cxlrd->cxlsd.cxld.hpa_range; + return range_contains(r1, r2); +} + +static int match_region_by_range(struct device *dev, void *data) +{ + struct cxl_region_params *p; + struct cxl_region *cxlr; + struct range *r = data; + int rc = 0; + + if (!is_cxl_region(dev)) + return 0; + + cxlr = to_cxl_region(dev); + p = &cxlr->params; + + down_read(&cxl_region_rwsem); + if (p->res && p->res->start == r->start && p->res->end == r->end) + rc = 1; + up_read(&cxl_region_rwsem); + + return rc; +} + +/* Establish an empty region covering the given HPA range */ +static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd, + struct cxl_endpoint_decoder *cxled) +{ + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct cxl_port *port = cxlrd_to_port(cxlrd); + struct range *hpa = &cxled->cxld.hpa_range; + struct cxl_region_params *p; + struct cxl_region *cxlr; + struct resource *res; + int rc; + + do { + cxlr = __create_region(cxlrd, cxled->mode, + atomic_read(&cxlrd->region_id)); + } while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY); + + if (IS_ERR(cxlr)) { + dev_err(cxlmd->dev.parent, + "%s:%s: %s failed assign region: %ld\n", + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), + __func__, PTR_ERR(cxlr)); + return cxlr; + } + + down_write(&cxl_region_rwsem); + p = &cxlr->params; + if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) { + dev_err(cxlmd->dev.parent, + "%s:%s: %s autodiscovery interrupted\n", + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), + __func__); + rc = -EBUSY; + goto err; + } + + set_bit(CXL_REGION_F_AUTO, &cxlr->flags); + + res = kmalloc(sizeof(*res), GFP_KERNEL); + if (!res) { + rc = -ENOMEM; + goto err; + } + + *res = DEFINE_RES_MEM_NAMED(hpa->start, range_len(hpa), + dev_name(&cxlr->dev)); + rc = insert_resource(cxlrd->res, res); + if (rc) { + /* + * Platform-firmware may not have split resources like "System + * RAM" on CXL window boundaries see cxl_region_iomem_release() + */ + dev_warn(cxlmd->dev.parent, + "%s:%s: %s %s cannot insert resource\n", + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), + __func__, dev_name(&cxlr->dev)); + } + + p->res = res; + p->interleave_ways = cxled->cxld.interleave_ways; + p->interleave_granularity = cxled->cxld.interleave_granularity; + p->state = CXL_CONFIG_INTERLEAVE_ACTIVE; + + rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group()); + if (rc) + goto err; + + dev_dbg(cxlmd->dev.parent, "%s:%s: %s %s res: %pr iw: %d ig: %d\n", + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev), __func__, + dev_name(&cxlr->dev), p->res, p->interleave_ways, + p->interleave_granularity); + + /* ...to match put_device() in cxl_add_to_region() */ + get_device(&cxlr->dev); + up_write(&cxl_region_rwsem); + + return cxlr; + +err: + up_write(&cxl_region_rwsem); + devm_release_action(port->uport, unregister_region, cxlr); + return ERR_PTR(rc); +} + +int cxl_add_to_region(struct cxl_port *root, struct cxl_endpoint_decoder *cxled) +{ + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); + struct range *hpa = &cxled->cxld.hpa_range; + struct cxl_decoder *cxld = &cxled->cxld; + struct cxl_root_decoder *cxlrd; + struct cxl_region_params *p; + struct cxl_region *cxlr; + bool attach = false; + struct device *dev; + int rc; + + dev = device_find_child(&root->dev, &cxld->hpa_range, + match_decoder_by_range); + if (!dev) { + dev_err(cxlmd->dev.parent, + "%s:%s no CXL window for range %#llx:%#llx\n", + dev_name(&cxlmd->dev), dev_name(&cxld->dev), + cxld->hpa_range.start, cxld->hpa_range.end); + return -ENXIO; + } + + cxlrd = to_cxl_root_decoder(dev); + + /* + * Ensure that if multiple threads race to construct_region() for @hpa + * one does the construction and the others add to that. + */ + mutex_lock(&cxlrd->range_lock); + dev = device_find_child(&cxlrd->cxlsd.cxld.dev, hpa, + match_region_by_range); + if (!dev) + cxlr = construct_region(cxlrd, cxled); + else + cxlr = to_cxl_region(dev); + mutex_unlock(&cxlrd->range_lock); + + if (IS_ERR(cxlr)) { + rc = PTR_ERR(cxlr); + goto out; + } + + attach_target(cxlr, cxled, -1, TASK_UNINTERRUPTIBLE); + + down_read(&cxl_region_rwsem); + p = &cxlr->params; + attach = p->state == CXL_CONFIG_COMMIT; + up_read(&cxl_region_rwsem); + + if (attach) { + int rc = device_attach(&cxlr->dev); + + /* + * If device_attach() fails the range may still be active via + * the platform-firmware memory map, otherwise the driver for + * regions is local to this file, so driver matching can't fail. + */ + if (rc < 0) + dev_err(&cxlr->dev, "failed to enable, range: %pr\n", + p->res); + } + + put_device(&cxlr->dev); +out: + put_device(&cxlrd->cxlsd.cxld.dev); + return rc; +} +EXPORT_SYMBOL_NS_GPL(cxl_add_to_region, CXL); + static int cxl_region_invalidate_memregion(struct cxl_region *cxlr) { if (!test_bit(CXL_REGION_F_INCOHERENT, &cxlr->flags)) @@ -2111,6 +2567,15 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr) return 0; } +static int is_system_ram(struct resource *res, void *arg) +{ + struct cxl_region *cxlr = arg; + struct cxl_region_params *p = &cxlr->params; + + dev_dbg(&cxlr->dev, "%pr has System RAM: %pr\n", p->res, res); + return 1; +} + static int cxl_region_probe(struct device *dev) { struct cxl_region *cxlr = to_cxl_region(dev); @@ -2144,6 +2609,18 @@ static int cxl_region_probe(struct device *dev) switch (cxlr->mode) { case CXL_DECODER_PMEM: return devm_cxl_add_pmem_region(cxlr); + case CXL_DECODER_RAM: + /* + * The region can not be manged by CXL if any portion of + * it is already online as 'System RAM' + */ + if (walk_iomem_res_desc(IORES_DESC_NONE, + IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY, + p->res->start, p->res->end, cxlr, + is_system_ram) > 0) + return 0; + dev_dbg(dev, "TODO: hookup devdax\n"); + return 0; default: dev_dbg(&cxlr->dev, "unsupported region mode: %d\n", cxlr->mode); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index ca76879af1de..c8ee4bb8cce6 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -261,6 +261,8 @@ resource_size_t cxl_rcrb_to_component(struct device *dev, * cxl_decoder flags that define the type of memory / devices this * decoder supports as well as configuration lock status See "CXL 2.0 * 8.2.5.12.7 CXL HDM Decoder 0 Control Register" for details. + * Additionally indicate whether decoder settings were autodetected, + * user customized. */ #define CXL_DECODER_F_RAM BIT(0) #define CXL_DECODER_F_PMEM BIT(1) @@ -334,12 +336,22 @@ static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode) return "mixed"; } +/* + * Track whether this decoder is reserved for region autodiscovery, or + * free for userspace provisioning. + */ +enum cxl_decoder_state { + CXL_DECODER_STATE_MANUAL, + CXL_DECODER_STATE_AUTO, +}; + /** * struct cxl_endpoint_decoder - Endpoint / SPA to DPA decoder * @cxld: base cxl_decoder_object * @dpa_res: actively claimed DPA span of this decoder * @skip: offset into @dpa_res where @cxld.hpa_range maps * @mode: which memory type / access-mode-partition this decoder targets + * @state: autodiscovery state * @pos: interleave position in @cxld.region */ struct cxl_endpoint_decoder { @@ -347,6 +359,7 @@ struct cxl_endpoint_decoder { struct resource *dpa_res; resource_size_t skip; enum cxl_decoder_mode mode; + enum cxl_decoder_state state; int pos; }; @@ -380,6 +393,7 @@ typedef struct cxl_dport *(*cxl_calc_hb_fn)(struct cxl_root_decoder *cxlrd, * @region_id: region id for next region provisioning event * @calc_hb: which host bridge covers the n'th position by granularity * @platform_data: platform specific configuration data + * @range_lock: sync region autodiscovery by address range * @cxlsd: base cxl switch decoder */ struct cxl_root_decoder { @@ -387,6 +401,7 @@ struct cxl_root_decoder { atomic_t region_id; cxl_calc_hb_fn calc_hb; void *platform_data; + struct mutex range_lock; struct cxl_switch_decoder cxlsd; }; @@ -436,6 +451,13 @@ struct cxl_region_params { */ #define CXL_REGION_F_INCOHERENT 0 +/* + * Indicate whether this region has been assembled by autodetection or + * userspace assembly. Prevent endpoint decoders outside of automatic + * detection from being added to the region. + */ +#define CXL_REGION_F_AUTO 1 + /** * struct cxl_region - CXL region * @dev: This region's device @@ -699,6 +721,8 @@ struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct device *dev); #ifdef CONFIG_CXL_REGION bool is_cxl_pmem_region(struct device *dev); struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev); +int cxl_add_to_region(struct cxl_port *root, + struct cxl_endpoint_decoder *cxled); #else static inline bool is_cxl_pmem_region(struct device *dev) { @@ -708,6 +732,11 @@ static inline struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev) { return NULL; } +static inline int cxl_add_to_region(struct cxl_port *root, + struct cxl_endpoint_decoder *cxled) +{ + return 0; +} #endif /* diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c index a8d46a67b45e..d88518836c2d 100644 --- a/drivers/cxl/port.c +++ b/drivers/cxl/port.c @@ -30,6 +30,34 @@ static void schedule_detach(void *cxlmd) schedule_cxl_memdev_detach(cxlmd); } +static int discover_region(struct device *dev, void *root) +{ + struct cxl_endpoint_decoder *cxled; + int rc; + + if (!is_endpoint_decoder(dev)) + return 0; + + cxled = to_cxl_endpoint_decoder(dev); + if ((cxled->cxld.flags & CXL_DECODER_F_ENABLE) == 0) + return 0; + + if (cxled->state != CXL_DECODER_STATE_AUTO) + return 0; + + /* + * Region enumeration is opportunistic, if this add-event fails, + * continue to the next endpoint decoder. + */ + rc = cxl_add_to_region(root, cxled); + if (rc) + dev_dbg(dev, "failed to add to region: %#llx-%#llx\n", + cxled->cxld.hpa_range.start, cxled->cxld.hpa_range.end); + + return 0; +} + + static int cxl_switch_port_probe(struct cxl_port *port) { struct cxl_hdm *cxlhdm; @@ -54,6 +82,7 @@ static int cxl_endpoint_port_probe(struct cxl_port *port) struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport); struct cxl_dev_state *cxlds = cxlmd->cxlds; struct cxl_hdm *cxlhdm; + struct cxl_port *root; int rc; cxlhdm = devm_cxl_setup_hdm(port); @@ -78,7 +107,24 @@ static int cxl_endpoint_port_probe(struct cxl_port *port) return rc; } - return devm_cxl_enumerate_decoders(cxlhdm); + rc = devm_cxl_enumerate_decoders(cxlhdm); + if (rc) + return rc; + + /* + * This can't fail in practice as CXL root exit unregisters all + * descendant ports and that in turn synchronizes with cxl_port_probe() + */ + root = find_cxl_root(&cxlmd->dev); + + /* + * Now that all endpoint decoders are successfully enumerated, try to + * assemble regions from committed decoders + */ + device_for_each_child(&port->dev, root, discover_region); + put_device(&root->dev); + + return 0; } static int cxl_port_probe(struct device *dev) From patchwork Fri Feb 10 09:06:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 652408 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B16A4C636D3 for ; Fri, 10 Feb 2023 09:08:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231628AbjBJJIM (ORCPT ); Fri, 10 Feb 2023 04:08:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39248 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231863AbjBJJHc (ORCPT ); Fri, 10 Feb 2023 04:07:32 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3152BDBDF; Fri, 10 Feb 2023 01:06:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1676020013; x=1707556013; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Kiq8SiWxreJ/WplvYfP6rIo46j43k+e9EnBH+6Fzm+Q=; b=L2BSlSYB+kESKX/5othgWdaeXwqbvFYTDA3nSKCx425ekiJubiWR2IS/ UZF/hzfKT54qhEIDUUFCERkJaHY6Itk9JfI9Qh4jn/Ao6Sp42jJJ8y8Ui FMYF75+WyVUJX1/wXug49r1z0+TNRUjACY9tRj6lzYODUiVm2F07l9Smv GfbenDy/5GVOnZuWTKSi06Mh4l30rPZhxC9XHQrK51qv4dZEs2FSZeAd8 OEr3n/eBgixzS9ocRtJXOvBuNxo34iuNQcTpg7hhuYQKu4qteYSG8k0Es j5zfXWN727pNhPyv7HN1xuuZFFRE7aDUp2JJAtNU7nVWwrZU2lipMsy3g w==; X-IronPort-AV: E=McAfee;i="6500,9779,10616"; a="328062586" X-IronPort-AV: E=Sophos;i="5.97,286,1669104000"; d="scan'208";a="328062586" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2023 01:06:52 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10616"; a="700392732" X-IronPort-AV: E=Sophos;i="5.97,286,1669104000"; d="scan'208";a="700392732" Received: from hrchavan-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.46.42]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2023 01:06:51 -0800 Subject: [PATCH v2 15/20] dax/hmem: Move HMAT and Soft reservation probe initcall level From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Fan Ni , vishal.l.verma@intel.com, dave.hansen@linux.intel.com, linux-mm@kvack.org, linux-acpi@vger.kernel.org Date: Fri, 10 Feb 2023 01:06:51 -0800 Message-ID: <167602001107.1924368.11562316181038595611.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <167601992097.1924368.18291887895351917895.stgit@dwillia2-xfh.jf.intel.com> References: <167601992097.1924368.18291887895351917895.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org In preparation for moving more filtering of "hmem" ranges into the dax_hmem.ko module, update the initcall levels. HMAT range registration moves to subsys_initcall() to be done before Soft Reservation probing, and Soft Reservation probing is moved to device_initcall() to be done before dax_hmem.ko initialization if it is built-in. Tested-by: Fan Ni Link: https://lore.kernel.org/r/167564542109.847146.10113972881782419363.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams Reviewed-by: Dave Jiang --- drivers/acpi/numa/hmat.c | 2 +- drivers/dax/hmem/Makefile | 3 ++- drivers/dax/hmem/device.c | 2 +- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c index 605a0c7053be..ff24282301ab 100644 --- a/drivers/acpi/numa/hmat.c +++ b/drivers/acpi/numa/hmat.c @@ -869,4 +869,4 @@ static __init int hmat_init(void) acpi_put_table(tbl); return 0; } -device_initcall(hmat_init); +subsys_initcall(hmat_init); diff --git a/drivers/dax/hmem/Makefile b/drivers/dax/hmem/Makefile index 57377b4c3d47..d4c4cd6bccd7 100644 --- a/drivers/dax/hmem/Makefile +++ b/drivers/dax/hmem/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 -obj-$(CONFIG_DEV_DAX_HMEM) += dax_hmem.o +# device_hmem.o deliberately precedes dax_hmem.o for initcall ordering obj-$(CONFIG_DEV_DAX_HMEM_DEVICES) += device_hmem.o +obj-$(CONFIG_DEV_DAX_HMEM) += dax_hmem.o device_hmem-y := device.o dax_hmem-y := hmem.o diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c index 903325aac991..20749c7fab81 100644 --- a/drivers/dax/hmem/device.c +++ b/drivers/dax/hmem/device.c @@ -104,4 +104,4 @@ static __init int hmem_init(void) * As this is a fallback for address ranges unclaimed by the ACPI HMAT * parsing it must be at an initcall level greater than hmat_init(). */ -late_initcall(hmem_init); +device_initcall(hmem_init); From patchwork Fri Feb 10 09:07:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 652407 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BF4AC636D3 for ; Fri, 10 Feb 2023 09:08:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231849AbjBJJIa (ORCPT ); Fri, 10 Feb 2023 04:08:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231894AbjBJJHk (ORCPT ); Fri, 10 Feb 2023 04:07:40 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4205F40E9; Fri, 10 Feb 2023 01:07:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1676020023; x=1707556023; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fsFyJVK/yMU8PeF2io/JqvsaT5S0XIoYNB1iom5H8o0=; b=JpzUwbbIq7mBgQ76cYlGttvVrllYFRZIMo2utqs504MfwtOzmIKGk11s 9zvC9MDsNi3qJaHVmgZMRyLPlUnDrQhJ03Kv127dIlpjhFW65/EAeem3L 9FLvsfClekL6jF6VFRsRJdaPjAmXbnuNjgQQX2NnNAjjDXkaXuwv5MQ1X nsp+nMGUelTNnT4+autqwJzpZRMsq7PHYjD8kQsCf7kX+77vvJJIMm5Rv CE2voMLjtpZny5aR37vgxgQWqMEJu5H4ejPdjBE2G6B41jWxKKS9s2cKI 9dsrk7Z/dXQvQBLj+cMXJpBiAgojgurO9G2qYBVaA09CAPY757JJ7wXRi w==; X-IronPort-AV: E=McAfee;i="6500,9779,10616"; a="328062621" X-IronPort-AV: E=Sophos;i="5.97,286,1669104000"; d="scan'208";a="328062621" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2023 01:07:02 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10616"; a="700392901" X-IronPort-AV: E=Sophos;i="5.97,286,1669104000"; d="scan'208";a="700392901" Received: from hrchavan-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.46.42]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2023 01:07:02 -0800 Subject: [PATCH v2 17/20] dax/hmem: Convey the dax range via memregion_info() From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Jonathan Cameron , Fan Ni , vishal.l.verma@intel.com, dave.hansen@linux.intel.com, linux-mm@kvack.org, linux-acpi@vger.kernel.org Date: Fri, 10 Feb 2023 01:07:02 -0800 Message-ID: <167602002217.1924368.7036275892522551624.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <167601992097.1924368.18291887895351917895.stgit@dwillia2-xfh.jf.intel.com> References: <167601992097.1924368.18291887895351917895.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org In preparation for hmem platform devices to be unregistered, stop using platform_device_add_resources() to convey the address range. The platform_device_add_resources() API causes an existing "Soft Reserved" iomem resource to be re-parented under an inserted platform device resource. When that platform device is deleted it removes the platform device resource and all children. Instead, it is sufficient to convey just the address range and let request_mem_region() insert resources to indicate the devices active in the range. This allows the "Soft Reserved" resource to be re-enumerated upon the next probe event. Reviewed-by: Jonathan Cameron Tested-by: Fan Ni Link: https://lore.kernel.org/r/167564543303.847146.11045895213318648441.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams Reviewed-by: Dave Jiang Reviewed-by: Vishal Verma --- drivers/dax/hmem/device.c | 37 ++++++++++++++----------------------- drivers/dax/hmem/hmem.c | 14 +++----------- include/linux/memregion.h | 2 ++ 3 files changed, 19 insertions(+), 34 deletions(-) diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c index 20749c7fab81..b1b339bccfe5 100644 --- a/drivers/dax/hmem/device.c +++ b/drivers/dax/hmem/device.c @@ -15,15 +15,8 @@ static struct resource hmem_active = { .flags = IORESOURCE_MEM, }; -void hmem_register_device(int target_nid, struct resource *r) +void hmem_register_device(int target_nid, struct resource *res) { - /* define a clean / non-busy resource for the platform device */ - struct resource res = { - .start = r->start, - .end = r->end, - .flags = IORESOURCE_MEM, - .desc = IORES_DESC_SOFT_RESERVED, - }; struct platform_device *pdev; struct memregion_info info; int rc, id; @@ -31,55 +24,53 @@ void hmem_register_device(int target_nid, struct resource *r) if (nohmem) return; - rc = region_intersects(res.start, resource_size(&res), IORESOURCE_MEM, - IORES_DESC_SOFT_RESERVED); + rc = region_intersects(res->start, resource_size(res), IORESOURCE_MEM, + IORES_DESC_SOFT_RESERVED); if (rc != REGION_INTERSECTS) return; id = memregion_alloc(GFP_KERNEL); if (id < 0) { - pr_err("memregion allocation failure for %pr\n", &res); + pr_err("memregion allocation failure for %pr\n", res); return; } pdev = platform_device_alloc("hmem", id); if (!pdev) { - pr_err("hmem device allocation failure for %pr\n", &res); + pr_err("hmem device allocation failure for %pr\n", res); goto out_pdev; } - if (!__request_region(&hmem_active, res.start, resource_size(&res), + if (!__request_region(&hmem_active, res->start, resource_size(res), dev_name(&pdev->dev), 0)) { - dev_dbg(&pdev->dev, "hmem range %pr already active\n", &res); + dev_dbg(&pdev->dev, "hmem range %pr already active\n", res); goto out_active; } pdev->dev.numa_node = numa_map_to_online_node(target_nid); info = (struct memregion_info) { .target_node = target_nid, + .range = { + .start = res->start, + .end = res->end, + }, }; rc = platform_device_add_data(pdev, &info, sizeof(info)); if (rc < 0) { - pr_err("hmem memregion_info allocation failure for %pr\n", &res); - goto out_resource; - } - - rc = platform_device_add_resources(pdev, &res, 1); - if (rc < 0) { - pr_err("hmem resource allocation failure for %pr\n", &res); + pr_err("hmem memregion_info allocation failure for %pr\n", res); goto out_resource; } rc = platform_device_add(pdev); if (rc < 0) { - dev_err(&pdev->dev, "device add failed for %pr\n", &res); + dev_err(&pdev->dev, "device add failed for %pr\n", res); goto out_resource; } return; out_resource: - __release_region(&hmem_active, res.start, resource_size(&res)); + __release_region(&hmem_active, res->start, resource_size(res)); out_active: platform_device_put(pdev); out_pdev: diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c index c7351e0dc8ff..5025a8c9850b 100644 --- a/drivers/dax/hmem/hmem.c +++ b/drivers/dax/hmem/hmem.c @@ -15,25 +15,17 @@ static int dax_hmem_probe(struct platform_device *pdev) struct memregion_info *mri; struct dev_dax_data data; struct dev_dax *dev_dax; - struct resource *res; - struct range range; - - res = platform_get_resource(pdev, IORESOURCE_MEM, 0); - if (!res) - return -ENOMEM; mri = dev->platform_data; - range.start = res->start; - range.end = res->end; - dax_region = alloc_dax_region(dev, pdev->id, &range, mri->target_node, - PMD_SIZE, 0); + dax_region = alloc_dax_region(dev, pdev->id, &mri->range, + mri->target_node, PMD_SIZE, 0); if (!dax_region) return -ENOMEM; data = (struct dev_dax_data) { .dax_region = dax_region, .id = -1, - .size = region_idle ? 0 : resource_size(res), + .size = region_idle ? 0 : range_len(&mri->range), }; dev_dax = devm_create_dev_dax(&data); if (IS_ERR(dev_dax)) diff --git a/include/linux/memregion.h b/include/linux/memregion.h index bf83363807ac..c01321467789 100644 --- a/include/linux/memregion.h +++ b/include/linux/memregion.h @@ -3,10 +3,12 @@ #define _MEMREGION_H_ #include #include +#include #include struct memregion_info { int target_node; + struct range range; }; #ifdef CONFIG_MEMREGION From patchwork Fri Feb 10 09:07:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 652406 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B8D4C6379F for ; Fri, 10 Feb 2023 09:08:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231929AbjBJJIr (ORCPT ); Fri, 10 Feb 2023 04:08:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39248 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231603AbjBJJIL (ORCPT ); Fri, 10 Feb 2023 04:08:11 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B0065774F; Fri, 10 Feb 2023 01:07:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1676020035; x=1707556035; h=subject:from:to:cc:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9J3LVeqpx/pEaciMjT7PQYSzPAi14MY2qHE7XN4c/uM=; b=mX2PuKEBmdw7+WAlGkfagoHaT47fGxQZ/tH/v55l3OZEwk7RNtteRhNk otHm7xZKshKW5JDnoaDGggNGra2s+BJ6mymlJ0AMacRniOBSjmeB+TGuE MyefR34Z3kmAZvtXGcKhsIHbAZo1C7UfPfB9/n+edIDjOu1SYc8pwoqsR QkbjCVX5tBaPV2TNIMm2pn3J0im/yvnm3DQbrQRgpjkpUqTBsyKKvc8eZ V4yWsfGZDQVkwvGmi77SzkdLeBk9sStWq9XUoIpp3OCP2B6lLxOA63R5M 04f39B1wmX9nzvM77/qQ6VXK1x8zBzIgqeWK8fPBWIgtcvJRiJ5qM27Hr A==; X-IronPort-AV: E=McAfee;i="6500,9779,10616"; a="310738835" X-IronPort-AV: E=Sophos;i="5.97,286,1669104000"; d="scan'208";a="310738835" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2023 01:07:14 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10616"; a="661341213" X-IronPort-AV: E=Sophos;i="5.97,286,1669104000"; d="scan'208";a="661341213" Received: from hrchavan-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.46.42]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2023 01:07:13 -0800 Subject: [PATCH v2 19/20] dax: Assign RAM regions to memory-hotplug by default From: Dan Williams To: linux-cxl@vger.kernel.org Cc: Michal Hocko , David Hildenbrand , Dave Hansen , Gregory Price , Fan Ni , vishal.l.verma@intel.com, linux-mm@kvack.org, linux-acpi@vger.kernel.org Date: Fri, 10 Feb 2023 01:07:13 -0800 Message-ID: <167602003336.1924368.6809503401422267885.stgit@dwillia2-xfh.jf.intel.com> In-Reply-To: <167601992097.1924368.18291887895351917895.stgit@dwillia2-xfh.jf.intel.com> References: <167601992097.1924368.18291887895351917895.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org The default mode for device-dax instances is backwards for RAM-regions as evidenced by the fact that it tends to catch end users by surprise. "Where is my memory?". Recall that platforms are increasingly shipping with performance-differentiated memory pools beyond typical DRAM and NUMA effects. This includes HBM (high-bandwidth-memory) and CXL (dynamic interleave, varied media types, and future fabric attached possibilities). For this reason the EFI_MEMORY_SP (EFI Special Purpose Memory => Linux 'Soft Reserved') attribute is expected to be applied to all memory-pools that are not the general purpose pool. This designation gives an Operating System a chance to defer usage of a memory pool until later in the boot process where its performance properties can be interrogated and administrator policy can be applied. 'Soft Reserved' memory can be anything from too limited and precious to be part of the general purpose pool (HBM), too slow to host hot kernel data structures (some PMEM media), or anything in between. However, in the absence of an explicit policy, the memory should at least be made usable by default. The current device-dax default hides all non-general-purpose memory behind a device interface. The expectation is that the distribution of users that want the memory online by default vs device-dedicated-access by default follows the Pareto principle. A small number of enlightened users may want to do userspace memory management through a device, but general users just want the kernel to make the memory available with an option to get more advanced later. Arrange for all device-dax instances not backed by PMEM to default to attaching to the dax_kmem driver. From there the baseline memory hotplug policy (CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE / memhp_default_state=) gates whether the memory comes online or stays offline. Where, if it stays offline, it can be reliably converted back to device-mode where it can be partitioned, or fronted by a userspace allocator. So, if someone wants device-dax instances for their 'Soft Reserved' memory: 1/ Build a kernel with CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n or boot with memhp_default_state=offline, or roll the dice and hope that the kernel has not pinned a page in that memory before step 2. 2/ Write a udev rule to convert the target dax device(s) from 'system-ram' mode to 'devdax' mode: daxctl reconfigure-device $dax -m devdax -f Cc: Michal Hocko Cc: David Hildenbrand Cc: Dave Hansen Reviewed-by: Gregory Price Tested-by: Fan Ni Link: https://lore.kernel.org/r/167564544513.847146.4645646177864365755.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams Reviewed-by: Dave Jiang Reviewed-by: Vishal Verma --- drivers/dax/Kconfig | 2 +- drivers/dax/bus.c | 53 ++++++++++++++++++++--------------------------- drivers/dax/bus.h | 12 +++++++++-- drivers/dax/device.c | 3 +-- drivers/dax/hmem/hmem.c | 12 ++++++++++- drivers/dax/kmem.c | 1 + 6 files changed, 46 insertions(+), 37 deletions(-) diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig index d13c889c2a64..1163eb62e5f6 100644 --- a/drivers/dax/Kconfig +++ b/drivers/dax/Kconfig @@ -50,7 +50,7 @@ config DEV_DAX_HMEM_DEVICES def_bool y config DEV_DAX_KMEM - tristate "KMEM DAX: volatile-use of persistent memory" + tristate "KMEM DAX: map dax-devices as System-RAM" default DEV_DAX depends on DEV_DAX depends on MEMORY_HOTPLUG # for add_memory() and friends diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 1dad813ee4a6..012d576004e9 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -56,6 +56,25 @@ static int dax_match_id(struct dax_device_driver *dax_drv, struct device *dev) return match; } +static int dax_match_type(struct dax_device_driver *dax_drv, struct device *dev) +{ + enum dax_driver_type type = DAXDRV_DEVICE_TYPE; + struct dev_dax *dev_dax = to_dev_dax(dev); + + if (dev_dax->region->res.flags & IORESOURCE_DAX_KMEM) + type = DAXDRV_KMEM_TYPE; + + if (dax_drv->type == type) + return 1; + + /* default to device mode if dax_kmem is disabled */ + if (dax_drv->type == DAXDRV_DEVICE_TYPE && + !IS_ENABLED(CONFIG_DEV_DAX_KMEM)) + return 1; + + return 0; +} + enum id_action { ID_REMOVE, ID_ADD, @@ -216,14 +235,9 @@ static int dax_bus_match(struct device *dev, struct device_driver *drv) { struct dax_device_driver *dax_drv = to_dax_drv(drv); - /* - * All but the 'device-dax' driver, which has 'match_always' - * set, requires an exact id match. - */ - if (dax_drv->match_always) + if (dax_match_id(dax_drv, dev)) return 1; - - return dax_match_id(dax_drv, dev); + return dax_match_type(dax_drv, dev); } /* @@ -1413,13 +1427,10 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) } EXPORT_SYMBOL_GPL(devm_create_dev_dax); -static int match_always_count; - int __dax_driver_register(struct dax_device_driver *dax_drv, struct module *module, const char *mod_name) { struct device_driver *drv = &dax_drv->drv; - int rc = 0; /* * dax_bus_probe() calls dax_drv->probe() unconditionally. @@ -1434,26 +1445,7 @@ int __dax_driver_register(struct dax_device_driver *dax_drv, drv->mod_name = mod_name; drv->bus = &dax_bus_type; - /* there can only be one default driver */ - mutex_lock(&dax_bus_lock); - match_always_count += dax_drv->match_always; - if (match_always_count > 1) { - match_always_count--; - WARN_ON(1); - rc = -EINVAL; - } - mutex_unlock(&dax_bus_lock); - if (rc) - return rc; - - rc = driver_register(drv); - if (rc && dax_drv->match_always) { - mutex_lock(&dax_bus_lock); - match_always_count -= dax_drv->match_always; - mutex_unlock(&dax_bus_lock); - } - - return rc; + return driver_register(drv); } EXPORT_SYMBOL_GPL(__dax_driver_register); @@ -1463,7 +1455,6 @@ void dax_driver_unregister(struct dax_device_driver *dax_drv) struct dax_id *dax_id, *_id; mutex_lock(&dax_bus_lock); - match_always_count -= dax_drv->match_always; list_for_each_entry_safe(dax_id, _id, &dax_drv->ids, list) { list_del(&dax_id->list); kfree(dax_id); diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h index fbb940293d6d..8cd79ab34292 100644 --- a/drivers/dax/bus.h +++ b/drivers/dax/bus.h @@ -11,7 +11,10 @@ struct dax_device; struct dax_region; void dax_region_put(struct dax_region *dax_region); -#define IORESOURCE_DAX_STATIC (1UL << 0) +/* dax bus specific ioresource flags */ +#define IORESOURCE_DAX_STATIC BIT(0) +#define IORESOURCE_DAX_KMEM BIT(1) + struct dax_region *alloc_dax_region(struct device *parent, int region_id, struct range *range, int target_node, unsigned int align, unsigned long flags); @@ -25,10 +28,15 @@ struct dev_dax_data { struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data); +enum dax_driver_type { + DAXDRV_KMEM_TYPE, + DAXDRV_DEVICE_TYPE, +}; + struct dax_device_driver { struct device_driver drv; struct list_head ids; - int match_always; + enum dax_driver_type type; int (*probe)(struct dev_dax *dev); void (*remove)(struct dev_dax *dev); }; diff --git a/drivers/dax/device.c b/drivers/dax/device.c index 5494d745ced5..ecdff79e31f2 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -475,8 +475,7 @@ EXPORT_SYMBOL_GPL(dev_dax_probe); static struct dax_device_driver device_dax_driver = { .probe = dev_dax_probe, - /* all probe actions are unwound by devm, so .remove isn't necessary */ - .match_always = 1, + .type = DAXDRV_DEVICE_TYPE, }; static int __init dax_init(void) diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c index e7bdff3132fa..5ec08f9f8a57 100644 --- a/drivers/dax/hmem/hmem.c +++ b/drivers/dax/hmem/hmem.c @@ -11,15 +11,25 @@ module_param_named(region_idle, region_idle, bool, 0644); static int dax_hmem_probe(struct platform_device *pdev) { + unsigned long flags = IORESOURCE_DAX_KMEM; struct device *dev = &pdev->dev; struct dax_region *dax_region; struct memregion_info *mri; struct dev_dax_data data; struct dev_dax *dev_dax; + /* + * @region_idle == true indicates that an administrative agent + * wants to manipulate the range partitioning before the devices + * are created, so do not send them to the dax_kmem driver by + * default. + */ + if (region_idle) + flags = 0; + mri = dev->platform_data; dax_region = alloc_dax_region(dev, pdev->id, &mri->range, - mri->target_node, PMD_SIZE, 0); + mri->target_node, PMD_SIZE, flags); if (!dax_region) return -ENOMEM; diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index 4852a2dbdb27..918d01d3fbaa 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -239,6 +239,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax) static struct dax_device_driver device_dax_kmem_driver = { .probe = dev_dax_kmem_probe, .remove = dev_dax_kmem_remove, + .type = DAXDRV_KMEM_TYPE, }; static int __init dax_kmem_init(void)