mbox series

[00/18] cxl: Add support for QTG ID retrieval for CXL subsystem

Message ID 167571650007.587790.10040913293130712882.stgit@djiang5-mobl3.local
Headers show
Series cxl: Add support for QTG ID retrieval for CXL subsystem | expand

Message

Dave Jiang Feb. 6, 2023, 8:49 p.m. UTC
Hi Bjorn, please review the relevant patches to the PCI subsystem: 10/18, 11/18. Thank you!
pcie_get_speed() and pcie_get_width() are created in order to allow CXL driver to calculate
the bandwith and latency for the PCI links.

Hi Rafael, please review the relevant patches to the ACPI: 2/18, 5/18. Thank you!
acpi_ut_verify_cdat_checksum() is exported to allow usage by a driver.

This series adds the retrieval of QoS Throttling Group (QTG) IDs for the CXL Fixed
Memory Window Structure (CFMWS) and the CXL memory device. It provides the QTG IDs
to user space to provide some guidance with putting the proper DPA range under the
appropriate CFMWS window.

The CFMWS structure contains a QTG ID that is associated with the memory window that the
structure exports. On Linux, the CFMWS is represented as a CXL root decoder. The QTG
ID will be attached to the CXL root decoder and exported as a sysfs attribute (qtg_id).

The QTG ID for a device is retrieved via sending a _DSM method to the ACPI0017 device.
The _DSM expects an input package of 4 DWORDS that contains the read latency, write
latency, read bandwidth, and write banwidth. These are the caluclated numbers for the
path between the CXL device and the CXL host bridge (HB). The QTG ID is also exported
as a sysfs attribute under the mem device memory type:
/sys/bus/cxl/devices/memX/ram/qtg_id or /sys/bus/cxl/devices/memX/pmem/qtg_id

The latency numbers are the aggregated latencies for the path between the CXL device and
the CXL HB. If a CXL device is directly attached to the CXL HB, the latency
would be the device latencies from the device Coherent Device Attribute Table (CDAT) plus
the caluclated PCIe link latency between the device and the HB. The bandwidth in this
configuration would be the minimum between the CDAT bandwidth number and link bandwidth
between the device and the HB.

If a configuration has a switch in between, then the latency would be the aggregated
latencies from the device CDAT, the link latency between device and switch, the
latencies from the switch CDAT, and the link latency between switch and the HB. Given
that the switch CDAT is not easily retrieved on Linux currently, a guesstimated
constant number will be used for calculation. The bandwidth calculation would be the
min of device CDAT bandwidth, link bandwith between device and switch, switch CDAT
bandwidth, and link bandwidth between switch and HB.  And without the switch CDAT,
the switch CDAT bandwidth will be skipped.

There can be 0 or more switches between the CXL device and the CXL HB. There are detailed
examples on calculating bandwidth and latency in the CXL Memory Device Software Guide [4].

The CDAT provides Device Scoped Memory Affinity Structures (DSMAS) that contains the
Device Physical Address (DPA) range and the related Device Scoped Latency and Bandwidth
Informat Stuctures (DSLBIS). Each DSLBIS provides a latency or bandwidth entry that is
tied to a DSMAS entry via a per DSMAS unique DSMAD handle.

[1]: https://www.computeexpresslink.org/download-the-specification
[2]: https://uefi.org/sites/default/files/resources/Coherent%20Device%20Attribute%20Table_1.01.pdf
[3]: https://uefi.org/sites/default/files/resources/ACPI_Spec_6_5_Aug29.pdf
[4]: https://cdrdv2-public.intel.com/643805/643805_CXL%20Memory%20Device%20SW%20Guide_Rev1p0.pdf

---

Dave Jiang (18):
      cxl: Export QTG ids from CFMWS to sysfs
      ACPICA: Export acpi_ut_verify_cdat_checksum()
      cxl: Add checksum verification to CDAT from CXL
      cxl: Add common helpers for cdat parsing
      ACPICA: Fix 'struct acpi_cdat_dsmas' spelling mistake
      cxl: Add callback to parse the DSMAS subtables from CDAT
      cxl: Add callback to parse the DSLBIS subtable from CDAT
      cxl: Add support for _DSM Function for retrieving QTG ID
      cxl: Add helper function to retrieve ACPI handle of CXL root device
      PCI: Export pcie_get_speed() using the code from sysfs PCI link speed show function
      PCI: Export pcie_get_width() using the code from sysfs PCI link width show function
      cxl: Add helpers to calculate pci latency for the CXL device
      cxl: Add latency and bandwidth calculations for the CXL path
      cxl: Wait Memory_Info_Valid before access memory related info
      cxl: Move identify and partition query from pci probe to port probe
      cxl: Move reading of CDAT data from device to after media is ready
      cxl: Attach QTG IDs to the DPA ranges for the device
      cxl: Export sysfs attributes for device QTG IDs


 Documentation/ABI/testing/sysfs-bus-cxl |  22 ++++
 drivers/acpi/acpica/utcksum.c           |   4 +-
 drivers/cxl/acpi.c                      |   3 +
 drivers/cxl/core/Makefile               |   2 +
 drivers/cxl/core/acpi.c                 | 129 +++++++++++++++++++
 drivers/cxl/core/cdat.c                 | 157 ++++++++++++++++++++++++
 drivers/cxl/core/cdat.h                 |  15 +++
 drivers/cxl/core/mbox.c                 |   2 +
 drivers/cxl/core/memdev.c               |  26 ++++
 drivers/cxl/core/pci.c                  | 103 +++++++++++++++-
 drivers/cxl/core/port.c                 |  76 ++++++++++++
 drivers/cxl/cxl.h                       |  50 ++++++++
 drivers/cxl/cxlmem.h                    |   2 +
 drivers/cxl/cxlpci.h                    |  14 +++
 drivers/cxl/pci.c                       |   8 --
 drivers/cxl/port.c                      | 105 +++++++++++++++-
 drivers/pci/pci-sysfs.c                 |  21 +---
 drivers/pci/pci.c                       |  40 ++++++
 include/acpi/actbl1.h                   |   7 +-
 include/linux/acpi.h                    |   7 ++
 include/linux/pci.h                     |   2 +
 21 files changed, 760 insertions(+), 35 deletions(-)
 create mode 100644 drivers/cxl/core/acpi.c
 create mode 100644 drivers/cxl/core/cdat.c
 create mode 100644 drivers/cxl/core/cdat.h

--

Comments

Lukas Wunner Feb. 6, 2023, 10 p.m. UTC | #1
On Mon, Feb 06, 2023 at 01:50:06PM -0700, Dave Jiang wrote:
> 'struct acpi_cadt_dsmas' => 'struct acpi_cdat_dsmas'
> 
> Fixes: 51aad1a6723b ("ACPICA: Finish support for the CDAT table")
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>

ACPICA changes need to go into upstream first (via a pull request on
GitHub).  Once it's merged, you can submit the same patch downstream
for the kernel and reference the ACPICA commit with a Link: tag.

I've already submitted a pull request for the exact same change more
than a week ago:

https://github.com/acpica/acpica/pull/830

The pull request has been approved but not merged.  Hopefully that'll
happen soon.

Thanks,

Lukas

> ---
>  include/acpi/actbl1.h |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
> index 4175dce3967c..e8297cefde09 100644
> --- a/include/acpi/actbl1.h
> +++ b/include/acpi/actbl1.h
> @@ -344,7 +344,7 @@ enum acpi_cdat_type {
>  
>  /* Subtable 0: Device Scoped Memory Affinity Structure (DSMAS) */
>  
> -struct acpi_cadt_dsmas {
> +struct acpi_cdat_dsmas {
>  	u8 dsmad_handle;
>  	u8 flags;
>  	u16 reserved;
> 
>
Jonathan Cameron Feb. 9, 2023, 11:15 a.m. UTC | #2
On Mon, 06 Feb 2023 13:49:30 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> Export the QoS Throttling Group ID from the CXL Fixed Memory Window
> Structure (CFMWS) under the root decoder sysfs attributes.
> CXL rev3.0 9.17.1.3 CXL Fixed Memory Window Structure (CFMWS)
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>

Hi Dave,


I've no objection to this, but would good to say why this
might be of use to userspace.  What tooling needs it?

One comment on docs inline. With those two things tidied up
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


> ---
>  Documentation/ABI/testing/sysfs-bus-cxl |    7 +++++++
>  drivers/cxl/acpi.c                      |    3 +++
>  drivers/cxl/core/port.c                 |   14 ++++++++++++++
>  drivers/cxl/cxl.h                       |    3 +++
>  4 files changed, 27 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 8494ef27e8d2..0932c2f6fbf4 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -294,6 +294,13 @@ Description:
>  		(WO) Write a string in the form 'regionZ' to delete that region,
>  		provided it is currently idle / not bound to a driver.
>  
> +What:		/sys/bus/cxl/devices/decoderX.Y/qtg_id
> +Date:		Jan, 2023
> +KernelVersion:	v6.3
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RO) Shows the QoS Throttling Group ID. The QTG ID for a root
> +		decoder comes from the CFMWS structure of the CEDT.

Document the -1 value for no ID in here. Hopefully people will write
their userspace against this document and we want them to know about that
corner case!

>  
>  What:		/sys/bus/cxl/devices/regionZ/uuid
>  Date:		May, 2022
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 13cde44c6086..7a71bb5041c7 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -289,6 +289,9 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>  			}
>  		}
>  	}
> +
> +	cxld->qtg_id = cfmws->qtg_id;
> +
>  	rc = cxl_decoder_add(cxld, target_map);
>  err_xormap:
>  	if (rc)
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index b631a0520456..fe78daf7e7c8 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -284,6 +284,16 @@ static ssize_t interleave_ways_show(struct device *dev,
>  
>  static DEVICE_ATTR_RO(interleave_ways);
>  
> +static ssize_t qtg_id_show(struct device *dev,
> +			   struct device_attribute *attr, char *buf)
> +{
> +	struct cxl_decoder *cxld = to_cxl_decoder(dev);
> +
> +	return sysfs_emit(buf, "%d\n", cxld->qtg_id);
> +}
> +
> +static DEVICE_ATTR_RO(qtg_id);
> +
>  static struct attribute *cxl_decoder_base_attrs[] = {
>  	&dev_attr_start.attr,
>  	&dev_attr_size.attr,
> @@ -303,6 +313,7 @@ static struct attribute *cxl_decoder_root_attrs[] = {
>  	&dev_attr_cap_type2.attr,
>  	&dev_attr_cap_type3.attr,
>  	&dev_attr_target_list.attr,
> +	&dev_attr_qtg_id.attr,
>  	SET_CXL_REGION_ATTR(create_pmem_region)
>  	SET_CXL_REGION_ATTR(delete_region)
>  	NULL,
> @@ -1606,6 +1617,7 @@ struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
>  	}
>  
>  	atomic_set(&cxlrd->region_id, rc);
> +	cxld->qtg_id = CXL_QTG_ID_INVALID;
>  	return cxlrd;
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_root_decoder_alloc, CXL);
> @@ -1643,6 +1655,7 @@ struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
>  
>  	cxld = &cxlsd->cxld;
>  	cxld->dev.type = &cxl_decoder_switch_type;
> +	cxld->qtg_id = CXL_QTG_ID_INVALID;
>  	return cxlsd;
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_switch_decoder_alloc, CXL);
> @@ -1675,6 +1688,7 @@ struct cxl_endpoint_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port)
>  	}
>  
>  	cxld->dev.type = &cxl_decoder_endpoint_type;
> +	cxld->qtg_id = CXL_QTG_ID_INVALID;
>  	return cxled;
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_endpoint_decoder_alloc, CXL);
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 1b1cf459ac77..f558bbfc0332 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -279,6 +279,7 @@ enum cxl_decoder_type {
>   */
>  #define CXL_DECODER_MAX_INTERLEAVE 16
>  
> +#define CXL_QTG_ID_INVALID	-1
>  
>  /**
>   * struct cxl_decoder - Common CXL HDM Decoder Attributes
> @@ -290,6 +291,7 @@ enum cxl_decoder_type {
>   * @target_type: accelerator vs expander (type2 vs type3) selector
>   * @region: currently assigned region for this decoder
>   * @flags: memory type capabilities and locking
> + * @qtg_id: QoS Throttling Group ID
>   * @commit: device/decoder-type specific callback to commit settings to hw
>   * @reset: device/decoder-type specific callback to reset hw settings
>  */
> @@ -302,6 +304,7 @@ struct cxl_decoder {
>  	enum cxl_decoder_type target_type;
>  	struct cxl_region *region;
>  	unsigned long flags;
> +	int qtg_id;
>  	int (*commit)(struct cxl_decoder *cxld);
>  	int (*reset)(struct cxl_decoder *cxld);
>  };
> 
>
Jonathan Cameron Feb. 9, 2023, 11:34 a.m. UTC | #3
On Mon, 06 Feb 2023 13:49:48 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> A CDAT table is available from a CXL device. The table is read by the
> driver and cached in software. With the CXL subsystem needing to parse the
> CDAT table, the checksum should be verified. Add checksum verification
> after the CDAT table is read from device.
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Hi Dave,

Some comments on this follow on from previous patch so may not
be relevant once you've updated how that is done.

Jonathan

> ---
>  drivers/cxl/core/pci.c |   11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 57764e9cd19d..a24dac36bedd 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -3,6 +3,7 @@
>  #include <linux/io-64-nonatomic-lo-hi.h>
>  #include <linux/device.h>
>  #include <linux/delay.h>
> +#include <linux/acpi.h>
>  #include <linux/pci.h>
>  #include <linux/pci-doe.h>
>  #include <cxlpci.h>
> @@ -592,6 +593,7 @@ void read_cdat_data(struct cxl_port *port)
>  	struct device *dev = &port->dev;
>  	struct device *uport = port->uport;
>  	size_t cdat_length;
> +	acpi_status status;
>  	int rc;
>  
>  	cdat_doe = find_cdat_doe(uport);
> @@ -620,5 +622,14 @@ void read_cdat_data(struct cxl_port *port)
>  		port->cdat.length = 0;
>  		dev_err(dev, "CDAT data read error\n");
>  	}
> +
> +	status = acpi_ut_verify_cdat_checksum(port->cdat.table, port->cdat.length);
> +	if (status != AE_OK) {

if (ACPI_FAILURE(acpi_ut...))  or better still put that in the wrapper I suggeste
in previous patch so that we have normal kernel return code handling out here.


> +		/* Don't leave table data allocated on error */
> +		devm_kfree(dev, port->cdat.table);
> +		port->cdat.table = NULL;
> +		port->cdat.length = 0;

I'd rather see us manipulate a local copy of cdat_length, and cdat_table
then only assign them to the port->cdat fields the success path rather
than setting them then unsetting the on error.

Diff will be bigger, but nicer resulting code (and hopefully diff
won't be too big!)


> +		dev_err(dev, "CDAT data checksum error\n");
> +	}
>  }
>  EXPORT_SYMBOL_NS_GPL(read_cdat_data, CXL);
> 
>
Jonathan Cameron Feb. 9, 2023, 1:50 p.m. UTC | #4
On Mon, 06 Feb 2023 13:50:23 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> Provide a callback to parse the Device Scoped Latency and Bandwidth
> Information Structure (DSLBIS) in the CDAT structures. The DSLBIS
> contains the bandwidth and latency information that's tied to a DSMAS
> handle. The driver will retrieve the read and write latency and
> bandwidth associated with the DSMAS which is tied to a DPA range.
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
A few comments inline,

Thanks,

Jonathan

> ---
>  drivers/cxl/core/cdat.c |   34 ++++++++++++++++++++++++++++++++++
>  drivers/cxl/cxl.h       |    2 ++
>  drivers/cxl/port.c      |    9 ++++++++-
>  include/acpi/actbl1.h   |    5 +++++
>  4 files changed, 49 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
> index f9a64a0f1ee4..3c8f3956487e 100644
> --- a/drivers/cxl/core/cdat.c
> +++ b/drivers/cxl/core/cdat.c
> @@ -121,3 +121,37 @@ int cxl_dsmas_parse_entry(struct acpi_cdat_header *header, void *arg)
>  	return 0;
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_dsmas_parse_entry, CXL);
> +
> +int cxl_dslbis_parse_entry(struct acpi_cdat_header *header, void *arg)
> +{
> +	struct cxl_port *port = (struct cxl_port *)arg;
> +	struct dsmas_entry *dent;
> +	struct acpi_cdat_dslbis *dslbis;

Perhaps reorder to maintain the pretty upside-down Christmas trees
(I don't care :)

> +	u64 val;
> +
> +	if (header->type != ACPI_CDAT_TYPE_DSLBIS)
> +		return -EINVAL;

Isn't this guaranteed by the caller?  Seems overkill do it twice
and I don't think these will ever be called outside of that wrapper that
loops over the entries. I could be wrong though!

> +
> +	dslbis = (struct acpi_cdat_dslbis *)((unsigned long)header + sizeof(*header));
header + 1

> +	if ((dslbis->flags & ACPI_CEDT_DSLBIS_MEM_MASK) !=

This field 'must be ignored' if the DSMAS handle isn't a match
(as it's an initiator only entry) Odd though it may seem I think we
might see one of those on a type 3 device and we are probably going to
have other users of this function anyway.

I think you need to do the walk below to check we have a DSMAS match, before
running this check.

> +	     ACPI_CEDT_DSLBIS_MEM_MEMORY)
> +		return 0;
> +
> +	if (dslbis->data_type > ACPI_HMAT_WRITE_BANDWIDTH)
> +		return -ENXIO;

This would probably imply a new HMAT spec value, so probably just
log it and ignore rather than error out.

> +
> +	/* Value calculation with base_unit, see ACPI Spec 6.5 5.2.28.4 */
> +	val = dslbis->entry[0] * dslbis->entry_base_unit;

In theory this might overflow as u64 * u16.
Doubt it will ever happen in reality, but maybe a check and debug print if it does?

> +
> +	mutex_lock(&port->cdat.dsmas_lock);
> +	list_for_each_entry(dent, &port->cdat.dsmas_list, list) {
> +		if (dslbis->handle == dent->handle) {
> +			dent->qos[dslbis->data_type] = val;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&port->cdat.dsmas_lock);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_dslbis_parse_entry, CXL);
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 1e5e69f08480..849b22236f1d 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -705,6 +705,7 @@ struct dsmas_entry {
>  	struct list_head list;
>  	struct range dpa_range;
>  	u16 handle;
> +	u64 qos[ACPI_HMAT_WRITE_BANDWIDTH + 1];
>  };
>  
>  typedef int (*cdat_tbl_entry_handler)(struct acpi_cdat_header *header, void *arg);
> @@ -716,6 +717,7 @@ int cdat_table_parse_dslbis(void *table, cdat_tbl_entry_handler handler,
>  			    void *arg);
>  
>  int cxl_dsmas_parse_entry(struct acpi_cdat_header *header, void *arg);
> +int cxl_dslbis_parse_entry(struct acpi_cdat_header *header, void *arg);
>  
>  /*
>   * Unit test builds overrides this to __weak, find the 'strong' version
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index b1da73e99bab..8de311208b37 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -65,8 +65,15 @@ static int cxl_port_probe(struct device *dev)
>  			rc = cdat_table_parse_dsmas(port->cdat.table,
>  						    cxl_dsmas_parse_entry,
>  						    (void *)port);
> -			if (rc < 0)
> +			if (rc > 0) {
> +				rc = cdat_table_parse_dslbis(port->cdat.table,
> +							     cxl_dslbis_parse_entry,
> +							     (void *)port);
> +				if (rc <= 0)
> +					dev_dbg(dev, "Failed to parse DSLBIS: %d\n", rc);

If we have entries and they won't parse, I think we should be screaming louder.
dev_warn() would be my preference for this and the one in the previous patch.
Sure we can carry on, but something on the device is not working as expected.

> +			} else {
>  				dev_dbg(dev, "Failed to parse DSMAS: %d\n", rc);
> +			}
>  		}
>  
>  		rc = cxl_hdm_decode_init(cxlds, cxlhdm);
> diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
> index e8297cefde09..ff6092e45196 100644
> --- a/include/acpi/actbl1.h
> +++ b/include/acpi/actbl1.h
> @@ -369,6 +369,11 @@ struct acpi_cdat_dslbis {
>  	u16 reserved2;
>  };
>  
> +/* Flags for subtable above */
> +
> +#define ACPI_CEDT_DSLBIS_MEM_MASK	GENMASK(3, 0)
> +#define ACPI_CEDT_DSLBIS_MEM_MEMORY	0
> +
>  /* Subtable 2: Device Scoped Memory Side Cache Information Structure (DSMSCIS) */
>  
>  struct acpi_cdat_dsmscis {
> 
>
Jonathan Cameron Feb. 9, 2023, 2:10 p.m. UTC | #5
On Mon, 06 Feb 2023 13:50:42 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> Provide a helper to find the ACPI0017 device in order to issue the _DSM.
> The helper will take the 'struct device' from a cxl_port and iterate until
> the root device is reached. The ACPI handle will be returned from the root
> device.
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  drivers/cxl/core/acpi.c |   30 ++++++++++++++++++++++++++++++
>  drivers/cxl/cxl.h       |    1 +
>  2 files changed, 31 insertions(+)
> 
> diff --git a/drivers/cxl/core/acpi.c b/drivers/cxl/core/acpi.c
> index 86dc6c9c1f24..05fcd4751619 100644
> --- a/drivers/cxl/core/acpi.c
> +++ b/drivers/cxl/core/acpi.c
> @@ -5,6 +5,7 @@
>  #include <linux/kernel.h>
>  #include <linux/acpi.h>
>  #include <linux/pci.h>
> +#include <linux/platform_device.h>
>  #include <asm/div64.h>
>  #include "cxlpci.h"
>  #include "cxl.h"
> @@ -13,6 +14,35 @@ const guid_t acpi_cxl_qtg_id_guid =
>  	GUID_INIT(0xF365F9A6, 0xA7DE, 0x4071,
>  		  0xA6, 0x6A, 0xB4, 0x0C, 0x0B, 0x4F, 0x8E, 0x52);
>  
> +/**
> + * cxl_acpi_get_root_acpi_handle - get the ACPI handle of the CXL root device
> + * @dev: 'struct device' to start searching from. Should be from cxl_port->dev.
> + * Looks for the ACPI0017 device and return the ACPI handle
> + **/

Inconsistent comment style.

> +acpi_handle cxl_acpi_get_rootdev_handle(struct device *dev)
> +{
> +	struct device *itr = dev, *root_dev;

Not nice for readability to have an assignment in a list of definitions
all on the same line.

> +	acpi_handle handle;
> +
> +	if (!dev)
> +		return ERR_PTR(-EINVAL);
> +
> +	while (itr->parent) {
> +		root_dev = itr;
> +		itr = itr->parent;
> +	}
> +
> +	if (!dev_is_platform(root_dev))
> +		return ERR_PTR(-ENODEV);
> +
> +	handle = ACPI_HANDLE(root_dev);
> +	if (!handle)
> +		return ERR_PTR(-ENODEV);
> +
> +	return handle;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_acpi_get_rootdev_handle, CXL);
> +
>  /**
>   * cxl_acpi_evaluate_qtg_dsm - Retrieve QTG ids via ACPI _DSM
>   * @handle: ACPI handle
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index e70df07f9b4b..ac6ea550ab0a 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -733,6 +733,7 @@ struct qtg_dsm_output {
>  
>  struct qtg_dsm_output *cxl_acpi_evaluate_qtg_dsm(acpi_handle handle,
>  						 struct qtg_dsm_input *input);
> +acpi_handle cxl_acpi_get_rootdev_handle(struct device *dev);
>  
>  /*
>   * Unit test builds overrides this to __weak, find the 'strong' version
> 
>
Jonathan Cameron Feb. 9, 2023, 3:24 p.m. UTC | #6
On Mon, 06 Feb 2023 13:51:19 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> CXL Memory Device SW Guide rev1.0 2.11.2 provides instruction on how to
> caluclate latency and bandwidth for CXL memory device. Calculate minimum

Spell check your descriptions (I often forget to do this as well!
)
> bandwidth and total latency for the path from the CXL device to the root
> port. The calculates values are stored in the cached DSMAS entries attached
> to the cxl_port of the CXL device.
> 
> For example for a device that is directly attached to a host bus:
> Total Latency = Device Latency (from CDAT) + Dev to Host Bus (HB) Link
> 		Latency
> Min Bandwidth = Link Bandwidth between Host Bus and CXL device
> 
> For a device that has a switch in between host bus and CXL device:
> Total Latency = Device (CDAT) Latency + Dev to Switch Link Latency +
> 		Switch (CDAT) Latency + Switch to HB Link Latency

For QTG purposes, are we also supposed to take into account HB to
system interconnect type latency (or maybe nearest CPU?).
That is likely to be non trivial.

> Min Bandwidth = min(dev to switch bandwidth, switch to HB bandwidth)
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>

Stray sign off.

> 
> The internal latency for a switch can be retrieved from the CDAT of the
> switch PCI device. However, since there's no easy way to retrieve that
> right now on Linux, a guesstimated constant is used per switch to simplify
> the driver code.

I'd like to see that gap closed asap. I think it is fairly obvious how to do
it, so shouldn't be too hard, just needs a dance to get the DOE for a switch
port using Lukas' updated handling of DOE mailboxes. 

> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  drivers/cxl/core/port.c |   60 +++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/cxl.h       |    9 +++++++
>  drivers/cxl/port.c      |   42 +++++++++++++++++++++++++++++++++
>  3 files changed, 111 insertions(+)
> 
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 2b27319cfd42..aa260361ba7d 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1899,6 +1899,66 @@ bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd)
>  }
>  EXPORT_SYMBOL_NS_GPL(schedule_cxl_memdev_detach, CXL);
>  
> +int cxl_port_get_downstream_qos(struct cxl_port *port, long *bw, long *lat)
> +{
> +	long total_lat = 0, latency;

Similar to before, not good for readability to hide asignments in a list all on one line.
Jonathan Cameron Feb. 9, 2023, 3:29 p.m. UTC | #7
On Mon, 06 Feb 2023 13:51:37 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> Move the enumeration of device capacity to cxl_port_probe() from
> cxl_pci_probe(). The size and capacity information should be read
> after cxl_await_media_ready() so the data is valid.
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Fix?

> ---
>  drivers/cxl/pci.c  |    8 --------
>  drivers/cxl/port.c |    8 ++++++++
>  2 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 258004f34281..e35ed250214e 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -484,14 +484,6 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	if (rc)
>  		return rc;
>  
> -	rc = cxl_dev_state_identify(cxlds);
> -	if (rc)
> -		return rc;
> -
> -	rc = cxl_mem_create_range_info(cxlds);
> -	if (rc)
> -		return rc;
> -
>  	cxlmd = devm_cxl_add_memdev(cxlds);
>  	if (IS_ERR(cxlmd))
>  		return PTR_ERR(cxlmd);
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index 03380c18fc52..b7a4a1be2945 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -127,6 +127,14 @@ static int cxl_port_probe(struct device *dev)
>  			if (rc)
>  				dev_dbg(dev, "Failed to do QoS calculations\n");
>  		}
> +
> +		rc = cxl_dev_state_identify(cxlds);
> +		if (rc)
> +			return rc;
> +
> +		rc = cxl_mem_create_range_info(cxlds);
> +		if (rc)
> +			return rc;
>  	}
>  
>  	rc = devm_cxl_enumerate_decoders(cxlhdm);
> 
>
Jonathan Cameron Feb. 9, 2023, 3:34 p.m. UTC | #8
On Mon, 06 Feb 2023 13:51:55 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> Match the DPA ranges of the mem device and the calcuated DPA range attached
> to the DSMAS. If a match is found, then assign the QTG ID to the relevant
> DPA range of the memory device.
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  drivers/cxl/core/mbox.c |    2 ++
>  drivers/cxl/cxlmem.h    |    2 ++
>  drivers/cxl/port.c      |   35 +++++++++++++++++++++++++++++++++++
>  3 files changed, 39 insertions(+)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index b03fba212799..2a7b07d65010 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -869,6 +869,8 @@ struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
>  
>  	mutex_init(&cxlds->mbox_mutex);
>  	cxlds->dev = dev;
> +	cxlds->pmem_qtg_id = -1;
> +	cxlds->ram_qtg_id = -1;
>  
>  	return cxlds;
>  }
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index ab138004f644..d88b88ecc807 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -251,6 +251,8 @@ struct cxl_dev_state {
>  	struct resource dpa_res;
>  	struct resource pmem_res;
>  	struct resource ram_res;
> +	int pmem_qtg_id;
> +	int ram_qtg_id;
>  	u64 total_bytes;
>  	u64 volatile_only_bytes;
>  	u64 persistent_only_bytes;
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index 6b2ad22487f5..c4cee69d6625 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -68,6 +68,39 @@ static int cxl_port_qos_calculate(struct cxl_port *port)
>  	return 0;
>  }
>  
> +static bool dpa_match_qtg_range(struct range *dpa, struct range *qtg)
> +{
> +	if (dpa->start >= qtg->start && dpa->end <= qtg->end)
> +		return true;
> +	return false;

	return dpa->start >= qtg->start && dpa->end <= qtg-end;

> +}
> +
> +static void cxl_dev_set_qtg(struct cxl_port *port, struct cxl_dev_state *cxlds)
> +{
> +	struct dsmas_entry *dent;
> +	struct range ram_range = {
> +		.start = cxlds->ram_res.start,
> +		.end = cxlds->ram_res.end,
> +	};
> +	struct range pmem_range =  {
> +		.start = cxlds->pmem_res.start,
> +		.end = cxlds->pmem_res.end,
> +	};
> +
> +	mutex_lock(&port->cdat.dsmas_lock);
> +	list_for_each_entry(dent, &port->cdat.dsmas_list, list) {
> +		if (dpa_match_qtg_range(&ram_range, &dent->dpa_range)) {
> +			cxlds->ram_qtg_id = dent->qtg_id;
> +			break;
> +		}
> +		if (dpa_match_qtg_range(&pmem_range, &dent->dpa_range)) {
> +			cxlds->pmem_qtg_id = dent->qtg_id;

Could be multiple ranges in ram and pmem. I guess we can leave that for
future work.

> +			break;
> +		}
> +	}
> +	mutex_unlock(&port->cdat.dsmas_lock);
> +}
> +
>  static int cxl_port_probe(struct device *dev)
>  {
>  	struct cxl_port *port = to_cxl_port(dev);
> @@ -134,6 +167,8 @@ static int cxl_port_probe(struct device *dev)
>  		rc = cxl_mem_create_range_info(cxlds);
>  		if (rc)
>  			return rc;
> +
> +		cxl_dev_set_qtg(port, cxlds);
>  	}
>  
>  	rc = devm_cxl_enumerate_decoders(cxlhdm);
> 
>
Dave Jiang Feb. 9, 2023, 5:28 p.m. UTC | #9
On 2/9/23 4:15 AM, Jonathan Cameron wrote:
> On Mon, 06 Feb 2023 13:49:30 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
> 
>> Export the QoS Throttling Group ID from the CXL Fixed Memory Window
>> Structure (CFMWS) under the root decoder sysfs attributes.
>> CXL rev3.0 9.17.1.3 CXL Fixed Memory Window Structure (CFMWS)
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> 
> Hi Dave,
> 
> 
> I've no objection to this, but would good to say why this
> might be of use to userspace.  What tooling needs it?

Will do.

> 
> One comment on docs inline. With those two things tidied up
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> 
>> ---
>>   Documentation/ABI/testing/sysfs-bus-cxl |    7 +++++++
>>   drivers/cxl/acpi.c                      |    3 +++
>>   drivers/cxl/core/port.c                 |   14 ++++++++++++++
>>   drivers/cxl/cxl.h                       |    3 +++
>>   4 files changed, 27 insertions(+)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
>> index 8494ef27e8d2..0932c2f6fbf4 100644
>> --- a/Documentation/ABI/testing/sysfs-bus-cxl
>> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
>> @@ -294,6 +294,13 @@ Description:
>>   		(WO) Write a string in the form 'regionZ' to delete that region,
>>   		provided it is currently idle / not bound to a driver.
>>   
>> +What:		/sys/bus/cxl/devices/decoderX.Y/qtg_id
>> +Date:		Jan, 2023
>> +KernelVersion:	v6.3
>> +Contact:	linux-cxl@vger.kernel.org
>> +Description:
>> +		(RO) Shows the QoS Throttling Group ID. The QTG ID for a root
>> +		decoder comes from the CFMWS structure of the CEDT.
> 
> Document the -1 value for no ID in here. Hopefully people will write
> their userspace against this document and we want them to know about that
> corner case!

Ok I will add.

> 
>>   
>>   What:		/sys/bus/cxl/devices/regionZ/uuid
>>   Date:		May, 2022
>> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
>> index 13cde44c6086..7a71bb5041c7 100644
>> --- a/drivers/cxl/acpi.c
>> +++ b/drivers/cxl/acpi.c
>> @@ -289,6 +289,9 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>>   			}
>>   		}
>>   	}
>> +
>> +	cxld->qtg_id = cfmws->qtg_id;
>> +
>>   	rc = cxl_decoder_add(cxld, target_map);
>>   err_xormap:
>>   	if (rc)
>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
>> index b631a0520456..fe78daf7e7c8 100644
>> --- a/drivers/cxl/core/port.c
>> +++ b/drivers/cxl/core/port.c
>> @@ -284,6 +284,16 @@ static ssize_t interleave_ways_show(struct device *dev,
>>   
>>   static DEVICE_ATTR_RO(interleave_ways);
>>   
>> +static ssize_t qtg_id_show(struct device *dev,
>> +			   struct device_attribute *attr, char *buf)
>> +{
>> +	struct cxl_decoder *cxld = to_cxl_decoder(dev);
>> +
>> +	return sysfs_emit(buf, "%d\n", cxld->qtg_id);
>> +}
>> +
>> +static DEVICE_ATTR_RO(qtg_id);
>> +
>>   static struct attribute *cxl_decoder_base_attrs[] = {
>>   	&dev_attr_start.attr,
>>   	&dev_attr_size.attr,
>> @@ -303,6 +313,7 @@ static struct attribute *cxl_decoder_root_attrs[] = {
>>   	&dev_attr_cap_type2.attr,
>>   	&dev_attr_cap_type3.attr,
>>   	&dev_attr_target_list.attr,
>> +	&dev_attr_qtg_id.attr,
>>   	SET_CXL_REGION_ATTR(create_pmem_region)
>>   	SET_CXL_REGION_ATTR(delete_region)
>>   	NULL,
>> @@ -1606,6 +1617,7 @@ struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
>>   	}
>>   
>>   	atomic_set(&cxlrd->region_id, rc);
>> +	cxld->qtg_id = CXL_QTG_ID_INVALID;
>>   	return cxlrd;
>>   }
>>   EXPORT_SYMBOL_NS_GPL(cxl_root_decoder_alloc, CXL);
>> @@ -1643,6 +1655,7 @@ struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
>>   
>>   	cxld = &cxlsd->cxld;
>>   	cxld->dev.type = &cxl_decoder_switch_type;
>> +	cxld->qtg_id = CXL_QTG_ID_INVALID;
>>   	return cxlsd;
>>   }
>>   EXPORT_SYMBOL_NS_GPL(cxl_switch_decoder_alloc, CXL);
>> @@ -1675,6 +1688,7 @@ struct cxl_endpoint_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port)
>>   	}
>>   
>>   	cxld->dev.type = &cxl_decoder_endpoint_type;
>> +	cxld->qtg_id = CXL_QTG_ID_INVALID;
>>   	return cxled;
>>   }
>>   EXPORT_SYMBOL_NS_GPL(cxl_endpoint_decoder_alloc, CXL);
>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>> index 1b1cf459ac77..f558bbfc0332 100644
>> --- a/drivers/cxl/cxl.h
>> +++ b/drivers/cxl/cxl.h
>> @@ -279,6 +279,7 @@ enum cxl_decoder_type {
>>    */
>>   #define CXL_DECODER_MAX_INTERLEAVE 16
>>   
>> +#define CXL_QTG_ID_INVALID	-1
>>   
>>   /**
>>    * struct cxl_decoder - Common CXL HDM Decoder Attributes
>> @@ -290,6 +291,7 @@ enum cxl_decoder_type {
>>    * @target_type: accelerator vs expander (type2 vs type3) selector
>>    * @region: currently assigned region for this decoder
>>    * @flags: memory type capabilities and locking
>> + * @qtg_id: QoS Throttling Group ID
>>    * @commit: device/decoder-type specific callback to commit settings to hw
>>    * @reset: device/decoder-type specific callback to reset hw settings
>>   */
>> @@ -302,6 +304,7 @@ struct cxl_decoder {
>>   	enum cxl_decoder_type target_type;
>>   	struct cxl_region *region;
>>   	unsigned long flags;
>> +	int qtg_id;
>>   	int (*commit)(struct cxl_decoder *cxld);
>>   	int (*reset)(struct cxl_decoder *cxld);
>>   };
>>
>>
>
Dave Jiang Feb. 9, 2023, 5:31 p.m. UTC | #10
On 2/9/23 4:34 AM, Jonathan Cameron wrote:
> On Mon, 06 Feb 2023 13:49:48 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
> 
>> A CDAT table is available from a CXL device. The table is read by the
>> driver and cached in software. With the CXL subsystem needing to parse the
>> CDAT table, the checksum should be verified. Add checksum verification
>> after the CDAT table is read from device.
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> Hi Dave,
> 
> Some comments on this follow on from previous patch so may not
> be relevant once you've updated how that is done.

Dan advised to just drop the ACPICA changes and just do it locally since 
the verification code is tiny and simple.

> 
> Jonathan
> 
>> ---
>>   drivers/cxl/core/pci.c |   11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
>> index 57764e9cd19d..a24dac36bedd 100644
>> --- a/drivers/cxl/core/pci.c
>> +++ b/drivers/cxl/core/pci.c
>> @@ -3,6 +3,7 @@
>>   #include <linux/io-64-nonatomic-lo-hi.h>
>>   #include <linux/device.h>
>>   #include <linux/delay.h>
>> +#include <linux/acpi.h>
>>   #include <linux/pci.h>
>>   #include <linux/pci-doe.h>
>>   #include <cxlpci.h>
>> @@ -592,6 +593,7 @@ void read_cdat_data(struct cxl_port *port)
>>   	struct device *dev = &port->dev;
>>   	struct device *uport = port->uport;
>>   	size_t cdat_length;
>> +	acpi_status status;
>>   	int rc;
>>   
>>   	cdat_doe = find_cdat_doe(uport);
>> @@ -620,5 +622,14 @@ void read_cdat_data(struct cxl_port *port)
>>   		port->cdat.length = 0;
>>   		dev_err(dev, "CDAT data read error\n");
>>   	}
>> +
>> +	status = acpi_ut_verify_cdat_checksum(port->cdat.table, port->cdat.length);
>> +	if (status != AE_OK) {
> 
> if (ACPI_FAILURE(acpi_ut...))  or better still put that in the wrapper I suggeste
> in previous patch so that we have normal kernel return code handling out here.
> 
> 
>> +		/* Don't leave table data allocated on error */
>> +		devm_kfree(dev, port->cdat.table);
>> +		port->cdat.table = NULL;
>> +		port->cdat.length = 0;
> 
> I'd rather see us manipulate a local copy of cdat_length, and cdat_table
> then only assign them to the port->cdat fields the success path rather
> than setting them then unsetting the on error.
> 
> Diff will be bigger, but nicer resulting code (and hopefully diff
> won't be too big!)

Ok, I'll create a prep patch to change this as you suggested.

> 
> 
>> +		dev_err(dev, "CDAT data checksum error\n");
>> +	}
>>   }
>>   EXPORT_SYMBOL_NS_GPL(read_cdat_data, CXL);
>>
>>
>
Dave Jiang Feb. 14, 2023, 12:24 a.m. UTC | #11
On 2/9/23 6:50 AM, Jonathan Cameron wrote:
> On Mon, 06 Feb 2023 13:50:23 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
> 
>> Provide a callback to parse the Device Scoped Latency and Bandwidth
>> Information Structure (DSLBIS) in the CDAT structures. The DSLBIS
>> contains the bandwidth and latency information that's tied to a DSMAS
>> handle. The driver will retrieve the read and write latency and
>> bandwidth associated with the DSMAS which is tied to a DPA range.
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> A few comments inline,
> 
> Thanks,
> 
> Jonathan
> 
>> ---
>>   drivers/cxl/core/cdat.c |   34 ++++++++++++++++++++++++++++++++++
>>   drivers/cxl/cxl.h       |    2 ++
>>   drivers/cxl/port.c      |    9 ++++++++-
>>   include/acpi/actbl1.h   |    5 +++++
>>   4 files changed, 49 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
>> index f9a64a0f1ee4..3c8f3956487e 100644
>> --- a/drivers/cxl/core/cdat.c
>> +++ b/drivers/cxl/core/cdat.c
>> @@ -121,3 +121,37 @@ int cxl_dsmas_parse_entry(struct acpi_cdat_header *header, void *arg)
>>   	return 0;
>>   }
>>   EXPORT_SYMBOL_NS_GPL(cxl_dsmas_parse_entry, CXL);
>> +
>> +int cxl_dslbis_parse_entry(struct acpi_cdat_header *header, void *arg)
>> +{
>> +	struct cxl_port *port = (struct cxl_port *)arg;
>> +	struct dsmas_entry *dent;
>> +	struct acpi_cdat_dslbis *dslbis;
> 
> Perhaps reorder to maintain the pretty upside-down Christmas trees
> (I don't care :)

will fix
> 
>> +	u64 val;
>> +
>> +	if (header->type != ACPI_CDAT_TYPE_DSLBIS)
>> +		return -EINVAL;
> 
> Isn't this guaranteed by the caller?  Seems overkill do it twice
> and I don't think these will ever be called outside of that wrapper that
> loops over the entries. I could be wrong though!
>

ok will remove


>> +
>> +	dslbis = (struct acpi_cdat_dslbis *)((unsigned long)header + sizeof(*header));
> header + 1
> 
>> +	if ((dslbis->flags & ACPI_CEDT_DSLBIS_MEM_MASK) !=
> 
> This field 'must be ignored' if the DSMAS handle isn't a match
> (as it's an initiator only entry) Odd though it may seem I think we
> might see one of those on a type 3 device and we are probably going to
> have other users of this function anyway.
> 
> I think you need to do the walk below to check we have a DSMAS match, before
> running this check.

ok, will move down to where entry is matched

> 
>> +	     ACPI_CEDT_DSLBIS_MEM_MEMORY)
>> +		return 0;
>> +
>> +	if (dslbis->data_type > ACPI_HMAT_WRITE_BANDWIDTH)
>> +		return -ENXIO;
> 
> This would probably imply a new HMAT spec value, so probably just
> log it and ignore rather than error out.

ok

> 
>> +
>> +	/* Value calculation with base_unit, see ACPI Spec 6.5 5.2.28.4 */
>> +	val = dslbis->entry[0] * dslbis->entry_base_unit;
> 
> In theory this might overflow as u64 * u16.
> Doubt it will ever happen in reality, but maybe a check and debug print if it does?

ok will use check_mul_overflow()
> 
>> +
>> +	mutex_lock(&port->cdat.dsmas_lock);
>> +	list_for_each_entry(dent, &port->cdat.dsmas_list, list) {
>> +		if (dslbis->handle == dent->handle) {
>> +			dent->qos[dslbis->data_type] = val;
>> +			break;
>> +		}
>> +	}
>> +	mutex_unlock(&port->cdat.dsmas_lock);
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_dslbis_parse_entry, CXL);
>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>> index 1e5e69f08480..849b22236f1d 100644
>> --- a/drivers/cxl/cxl.h
>> +++ b/drivers/cxl/cxl.h
>> @@ -705,6 +705,7 @@ struct dsmas_entry {
>>   	struct list_head list;
>>   	struct range dpa_range;
>>   	u16 handle;
>> +	u64 qos[ACPI_HMAT_WRITE_BANDWIDTH + 1];
>>   };
>>   
>>   typedef int (*cdat_tbl_entry_handler)(struct acpi_cdat_header *header, void *arg);
>> @@ -716,6 +717,7 @@ int cdat_table_parse_dslbis(void *table, cdat_tbl_entry_handler handler,
>>   			    void *arg);
>>   
>>   int cxl_dsmas_parse_entry(struct acpi_cdat_header *header, void *arg);
>> +int cxl_dslbis_parse_entry(struct acpi_cdat_header *header, void *arg);
>>   
>>   /*
>>    * Unit test builds overrides this to __weak, find the 'strong' version
>> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
>> index b1da73e99bab..8de311208b37 100644
>> --- a/drivers/cxl/port.c
>> +++ b/drivers/cxl/port.c
>> @@ -65,8 +65,15 @@ static int cxl_port_probe(struct device *dev)
>>   			rc = cdat_table_parse_dsmas(port->cdat.table,
>>   						    cxl_dsmas_parse_entry,
>>   						    (void *)port);
>> -			if (rc < 0)
>> +			if (rc > 0) {
>> +				rc = cdat_table_parse_dslbis(port->cdat.table,
>> +							     cxl_dslbis_parse_entry,
>> +							     (void *)port);
>> +				if (rc <= 0)
>> +					dev_dbg(dev, "Failed to parse DSLBIS: %d\n", rc);
> 
> If we have entries and they won't parse, I think we should be screaming louder.
> dev_warn() would be my preference for this and the one in the previous patch.
> Sure we can carry on, but something on the device is not working as expected.

ok will fix this one and previous.

> 
>> +			} else {
>>   				dev_dbg(dev, "Failed to parse DSMAS: %d\n", rc);
>> +			}
>>   		}
>>   
>>   		rc = cxl_hdm_decode_init(cxlds, cxlhdm);
>> diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
>> index e8297cefde09..ff6092e45196 100644
>> --- a/include/acpi/actbl1.h
>> +++ b/include/acpi/actbl1.h
>> @@ -369,6 +369,11 @@ struct acpi_cdat_dslbis {
>>   	u16 reserved2;
>>   };
>>   
>> +/* Flags for subtable above */
>> +
>> +#define ACPI_CEDT_DSLBIS_MEM_MASK	GENMASK(3, 0)
>> +#define ACPI_CEDT_DSLBIS_MEM_MEMORY	0
>> +
>>   /* Subtable 2: Device Scoped Memory Side Cache Information Structure (DSMSCIS) */
>>   
>>   struct acpi_cdat_dsmscis {
>>
>>
>
Dave Jiang Feb. 14, 2023, 9:29 p.m. UTC | #12
On 2/9/23 7:10 AM, Jonathan Cameron wrote:
> On Mon, 06 Feb 2023 13:50:42 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
> 
>> Provide a helper to find the ACPI0017 device in order to issue the _DSM.
>> The helper will take the 'struct device' from a cxl_port and iterate until
>> the root device is reached. The ACPI handle will be returned from the root
>> device.
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>> ---
>>   drivers/cxl/core/acpi.c |   30 ++++++++++++++++++++++++++++++
>>   drivers/cxl/cxl.h       |    1 +
>>   2 files changed, 31 insertions(+)
>>
>> diff --git a/drivers/cxl/core/acpi.c b/drivers/cxl/core/acpi.c
>> index 86dc6c9c1f24..05fcd4751619 100644
>> --- a/drivers/cxl/core/acpi.c
>> +++ b/drivers/cxl/core/acpi.c
>> @@ -5,6 +5,7 @@
>>   #include <linux/kernel.h>
>>   #include <linux/acpi.h>
>>   #include <linux/pci.h>
>> +#include <linux/platform_device.h>
>>   #include <asm/div64.h>
>>   #include "cxlpci.h"
>>   #include "cxl.h"
>> @@ -13,6 +14,35 @@ const guid_t acpi_cxl_qtg_id_guid =
>>   	GUID_INIT(0xF365F9A6, 0xA7DE, 0x4071,
>>   		  0xA6, 0x6A, 0xB4, 0x0C, 0x0B, 0x4F, 0x8E, 0x52);
>>   
>> +/**
>> + * cxl_acpi_get_root_acpi_handle - get the ACPI handle of the CXL root device
>> + * @dev: 'struct device' to start searching from. Should be from cxl_port->dev.
>> + * Looks for the ACPI0017 device and return the ACPI handle
>> + **/
> 
> Inconsistent comment style.

ok
> 
>> +acpi_handle cxl_acpi_get_rootdev_handle(struct device *dev)
>> +{
>> +	struct device *itr = dev, *root_dev;
> 
> Not nice for readability to have an assignment in a list of definitions
> all on the same line.

ok
> 
>> +	acpi_handle handle;
>> +
>> +	if (!dev)
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	while (itr->parent) {
>> +		root_dev = itr;
>> +		itr = itr->parent;
>> +	}
>> +
>> +	if (!dev_is_platform(root_dev))
>> +		return ERR_PTR(-ENODEV);
>> +
>> +	handle = ACPI_HANDLE(root_dev);
>> +	if (!handle)
>> +		return ERR_PTR(-ENODEV);
>> +
>> +	return handle;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_acpi_get_rootdev_handle, CXL);
>> +
>>   /**
>>    * cxl_acpi_evaluate_qtg_dsm - Retrieve QTG ids via ACPI _DSM
>>    * @handle: ACPI handle
>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>> index e70df07f9b4b..ac6ea550ab0a 100644
>> --- a/drivers/cxl/cxl.h
>> +++ b/drivers/cxl/cxl.h
>> @@ -733,6 +733,7 @@ struct qtg_dsm_output {
>>   
>>   struct qtg_dsm_output *cxl_acpi_evaluate_qtg_dsm(acpi_handle handle,
>>   						 struct qtg_dsm_input *input);
>> +acpi_handle cxl_acpi_get_rootdev_handle(struct device *dev);
>>   
>>   /*
>>    * Unit test builds overrides this to __weak, find the 'strong' version
>>
>>
>
Dave Jiang Feb. 14, 2023, 11:03 p.m. UTC | #13
On 2/9/23 8:24 AM, Jonathan Cameron wrote:
> On Mon, 06 Feb 2023 13:51:19 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
> 
>> CXL Memory Device SW Guide rev1.0 2.11.2 provides instruction on how to
>> caluclate latency and bandwidth for CXL memory device. Calculate minimum
> 
> Spell check your descriptions (I often forget to do this as well!
> )
>> bandwidth and total latency for the path from the CXL device to the root
>> port. The calculates values are stored in the cached DSMAS entries attached
>> to the cxl_port of the CXL device.
>>
>> For example for a device that is directly attached to a host bus:
>> Total Latency = Device Latency (from CDAT) + Dev to Host Bus (HB) Link
>> 		Latency
>> Min Bandwidth = Link Bandwidth between Host Bus and CXL device
>>
>> For a device that has a switch in between host bus and CXL device:
>> Total Latency = Device (CDAT) Latency + Dev to Switch Link Latency +
>> 		Switch (CDAT) Latency + Switch to HB Link Latency
> 
> For QTG purposes, are we also supposed to take into account HB to
> system interconnect type latency (or maybe nearest CPU?).
> That is likely to be non trivial.

Dan brought this ECN [1] to my attention. We can add this if we can find 
a BIOS that implements the ECN. Or should we code a place holder for it 
until this is available?

https://lore.kernel.org/linux-cxl/e1a52da9aec90766da5de51b1b839fd95d63a5af.camel@intel.com/

> 
>> Min Bandwidth = min(dev to switch bandwidth, switch to HB bandwidth)
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> 
> Stray sign off.
> 
>>
>> The internal latency for a switch can be retrieved from the CDAT of the
>> switch PCI device. However, since there's no easy way to retrieve that
>> right now on Linux, a guesstimated constant is used per switch to simplify
>> the driver code.
> 
> I'd like to see that gap closed asap. I think it is fairly obvious how to do
> it, so shouldn't be too hard, just needs a dance to get the DOE for a switch
> port using Lukas' updated handling of DOE mailboxes.

Talked to Lukas and this may not be difficult with his latest changes. I 
can take a look. Do we support switch CDAT in QEMU yet?

> 
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>> ---
>>   drivers/cxl/core/port.c |   60 +++++++++++++++++++++++++++++++++++++++++++++++
>>   drivers/cxl/cxl.h       |    9 +++++++
>>   drivers/cxl/port.c      |   42 +++++++++++++++++++++++++++++++++
>>   3 files changed, 111 insertions(+)
>>
>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
>> index 2b27319cfd42..aa260361ba7d 100644
>> --- a/drivers/cxl/core/port.c
>> +++ b/drivers/cxl/core/port.c
>> @@ -1899,6 +1899,66 @@ bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd)
>>   }
>>   EXPORT_SYMBOL_NS_GPL(schedule_cxl_memdev_detach, CXL);
>>   
>> +int cxl_port_get_downstream_qos(struct cxl_port *port, long *bw, long *lat)
>> +{
>> +	long total_lat = 0, latency;
> 
> Similar to before, not good for readability to hide asignments in a list all on one line.
>
Jonathan Cameron Feb. 15, 2023, 1:17 p.m. UTC | #14
On Tue, 14 Feb 2023 16:03:27 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> On 2/9/23 8:24 AM, Jonathan Cameron wrote:
> > On Mon, 06 Feb 2023 13:51:19 -0700
> > Dave Jiang <dave.jiang@intel.com> wrote:
> >   
> >> CXL Memory Device SW Guide rev1.0 2.11.2 provides instruction on how to
> >> caluclate latency and bandwidth for CXL memory device. Calculate minimum  
> > 
> > Spell check your descriptions (I often forget to do this as well!
> > )  
> >> bandwidth and total latency for the path from the CXL device to the root
> >> port. The calculates values are stored in the cached DSMAS entries attached
> >> to the cxl_port of the CXL device.
> >>
> >> For example for a device that is directly attached to a host bus:
> >> Total Latency = Device Latency (from CDAT) + Dev to Host Bus (HB) Link
> >> 		Latency
> >> Min Bandwidth = Link Bandwidth between Host Bus and CXL device
> >>
> >> For a device that has a switch in between host bus and CXL device:
> >> Total Latency = Device (CDAT) Latency + Dev to Switch Link Latency +
> >> 		Switch (CDAT) Latency + Switch to HB Link Latency  
> > 
> > For QTG purposes, are we also supposed to take into account HB to
> > system interconnect type latency (or maybe nearest CPU?).
> > That is likely to be non trivial.  
> 
> Dan brought this ECN [1] to my attention. We can add this if we can find 
> a BIOS that implements the ECN. Or should we code a place holder for it 
> until this is available?
> 
> https://lore.kernel.org/linux-cxl/e1a52da9aec90766da5de51b1b839fd95d63a5af.camel@intel.com/

I've had Generic Ports on my list to add to QEMU for a while but not been
high enough priority to either do it myself, or make it someone else's problem.
I suspect the biggest barrier in QEMU is going to be the interface to add
these to the NUMA description.

It's easy enough to hand build and inject a SRAT /SLIT/HMAT tables with
these in (that's how we developed the Generic Initiator support in Linux before
any BIOS support).  

So I'd like to see it soon, but I'm not hugely bothered if that element
follows this patch set. However, we are potentially going to see different
decisions made when that detail is added so it 'might' count as ABI
breakage if it's not there from the start. I think we are fine as probably
no BIOS' yet though.

> 
> >   
> >> Min Bandwidth = min(dev to switch bandwidth, switch to HB bandwidth)
> >> Signed-off-by: Dave Jiang <dave.jiang@intel.com>  
> > 
> > Stray sign off.
> >   
> >>
> >> The internal latency for a switch can be retrieved from the CDAT of the
> >> switch PCI device. However, since there's no easy way to retrieve that
> >> right now on Linux, a guesstimated constant is used per switch to simplify
> >> the driver code.  
> > 
> > I'd like to see that gap closed asap. I think it is fairly obvious how to do
> > it, so shouldn't be too hard, just needs a dance to get the DOE for a switch
> > port using Lukas' updated handling of DOE mailboxes.  
> 
> Talked to Lukas and this may not be difficult with his latest changes. I 
> can take a look. Do we support switch CDAT in QEMU yet?

I started typing no, then thought I'd just check.  Seems I did write support
for CDAT on switches (and then completely forgot about it ;)
It's upstream and everything!
https://elixir.bootlin.com/qemu/latest/source/hw/pci-bridge/cxl_upstream.c#L194
Dave Jiang Feb. 15, 2023, 4:38 p.m. UTC | #15
On 2/15/23 6:17 AM, Jonathan Cameron wrote:
> On Tue, 14 Feb 2023 16:03:27 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
> 
>> On 2/9/23 8:24 AM, Jonathan Cameron wrote:
>>> On Mon, 06 Feb 2023 13:51:19 -0700
>>> Dave Jiang <dave.jiang@intel.com> wrote:
>>>    
>>>> CXL Memory Device SW Guide rev1.0 2.11.2 provides instruction on how to
>>>> caluclate latency and bandwidth for CXL memory device. Calculate minimum
>>>
>>> Spell check your descriptions (I often forget to do this as well!
>>> )
>>>> bandwidth and total latency for the path from the CXL device to the root
>>>> port. The calculates values are stored in the cached DSMAS entries attached
>>>> to the cxl_port of the CXL device.
>>>>
>>>> For example for a device that is directly attached to a host bus:
>>>> Total Latency = Device Latency (from CDAT) + Dev to Host Bus (HB) Link
>>>> 		Latency
>>>> Min Bandwidth = Link Bandwidth between Host Bus and CXL device
>>>>
>>>> For a device that has a switch in between host bus and CXL device:
>>>> Total Latency = Device (CDAT) Latency + Dev to Switch Link Latency +
>>>> 		Switch (CDAT) Latency + Switch to HB Link Latency
>>>
>>> For QTG purposes, are we also supposed to take into account HB to
>>> system interconnect type latency (or maybe nearest CPU?).
>>> That is likely to be non trivial.
>>
>> Dan brought this ECN [1] to my attention. We can add this if we can find
>> a BIOS that implements the ECN. Or should we code a place holder for it
>> until this is available?
>>
>> https://lore.kernel.org/linux-cxl/e1a52da9aec90766da5de51b1b839fd95d63a5af.camel@intel.com/
> 
> I've had Generic Ports on my list to add to QEMU for a while but not been
> high enough priority to either do it myself, or make it someone else's problem.
> I suspect the biggest barrier in QEMU is going to be the interface to add
> these to the NUMA description.
> 
> It's easy enough to hand build and inject a SRAT /SLIT/HMAT tables with
> these in (that's how we developed the Generic Initiator support in Linux before
> any BIOS support).
> 
> So I'd like to see it soon, but I'm not hugely bothered if that element
> follows this patch set. However, we are potentially going to see different
> decisions made when that detail is added so it 'might' count as ABI
> breakage if it's not there from the start. I think we are fine as probably
> no BIOS' yet though.
> 
>>
>>>    
>>>> Min Bandwidth = min(dev to switch bandwidth, switch to HB bandwidth)
>>>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>>>
>>> Stray sign off.
>>>    
>>>>
>>>> The internal latency for a switch can be retrieved from the CDAT of the
>>>> switch PCI device. However, since there's no easy way to retrieve that
>>>> right now on Linux, a guesstimated constant is used per switch to simplify
>>>> the driver code.
>>>
>>> I'd like to see that gap closed asap. I think it is fairly obvious how to do
>>> it, so shouldn't be too hard, just needs a dance to get the DOE for a switch
>>> port using Lukas' updated handling of DOE mailboxes.
>>
>> Talked to Lukas and this may not be difficult with his latest changes. I
>> can take a look. Do we support switch CDAT in QEMU yet?
> 
> I started typing no, then thought I'd just check.  Seems I did write support
> for CDAT on switches (and then completely forgot about it ;)
> It's upstream and everything!
> https://elixir.bootlin.com/qemu/latest/source/hw/pci-bridge/cxl_upstream.c#L194
> 
Awesome! I'll go poke around a bit. Also it's very helpful to see the 
creation code. Helped me realize that I need to support parsing of 
SSLBIS sub-table for switches. Thanks!