diff mbox series

[v2] PCI/ACPI: Add extra slot register check for non-ACPI device

Message ID 1701174316-14149-1-git-send-email-wangdong202303@163.com
State New
Headers show
Series [v2] PCI/ACPI: Add extra slot register check for non-ACPI device | expand

Commit Message

wangdong28 Nov. 28, 2023, 12:25 p.m. UTC
From: Dong Wang <wangdong28@lenovo.com>

When enabling VMD function in UEFI setup, the physical slot of the M.2
NVMe device connected to the VMD device cannot be detected. Here is
the result from lspci ("Physical Slot" field is NOT shown):

 10001:01:00.0 Non-Volatile memory controller: Intel Corporation NVMe
 Datacenter SSD [3DNAND, Beta Rock Controller] (prog-if 02 [NVM Express])
   Subsystem: Intel Corporation NVMe Datacenter SSD [3DNAND] SE M.2 (P4511)

Generally, the physical slot (/sys/bus/pci/slots) will be created via
either ACPI walking path during kernel init or hotplug path:

ACPI walking path:
  pcibios_add_bus
    acpi_pci_add_bus
      acpi_pci_slot_enumerate
        acpi_walk_namespace
          register_slot
            pci_create_slot

hotplug path:
  __pci_hp_initialize
    pci_create_slot

[M.2 NVMe Device]
A. VMD disabled
When VMD is disabled, NVMe will be discovered during bus scanning and
recognized as acpi device. In this case, the physical slot is created
via the ACPI walking path.

B. VMD enabled
vmd_enable_domain() invokes pcibios_add_bus(). This means that it goes
through the ACPI walking path. However, acpi_pci_add_bus() returns
directly becase the statment "!ACPI_HANDLE(bus->bridge)" is true.
See the following code snippet:

  void acpi_pci_add_bus(struct pci_bus *bus)
  {
      ...
      if (acpi_pci_disabled || !bus->bridge || !ACPI_HANDLE(bus->bridge))
		return;
      ...
  }

Since VMD creates its own root bus and devices of VMD are attached to
the bus, those devices are non-ACPI devices. That's why
"!ACPI_HANDLE(bus->bridge)" returns true.

In addition, M.2 NVMe devices does not have the hotplug capability.
Here is the quote from PCI Express M.2 Specification (Revision 5.0,
Version 1.0):

  CAUTION: M.2 Add-in Cards are not designed or intended to support
  Hot-Swap or Hot-Plug connections. Performing Hot-Swap or Hot-Plug
  may pose danger to the M.2 Add-in Card, to the system Platform,
  and to the person performing this act.

M.2 NVMe devices (non-ACPI devices and no hotplug capability) connected
to the VMD device cannot meet the above-mentioned paths. The corresponding
slot info of the M.2 NVMe controller cannot be created in
/sys/bus/pci/slots.

Fix this issue by checking the available physical slot number in
slot capabilities register. If the physical slot number is available,
create the slot info accordingly. The following lspci output shows the
available slot info with applying this patch:

 10001:01:00.0 Non-Volatile memory controller: Intel Corporation NVMe
 Datacenter SSD [3DNAND, Beta Rock Controller] (prog-if 02 [NVM Express])
   Subsystem: Intel Corporation NVMe Datacenter SSD [3DNAND] SE M.2 (P4511)
   Physical Slot: 16

[U.2 NVMe device]
A. VMD disabled
Same as M.2 NVMe Device case "A".

B. VMD enabled
Same as M.2 NVMe Device case "B".

The hotplug of the U.2 device is optional (See "PCI Express SFF-8639 Module
Specification" for detail). The U.2 NVMe controller with hotplug capability
connected to the VMD device can meet the hotplug path, so the slot info can
be shown correctly via the lspci utility (without this patch):

 10000:82:00.0 Non-Volatile memory controller: Intel Corporation NVMe
 Datacenter SSD [3DNAND, Beta Rock Controller] (prog-if 02 [NVM Express])
   Subsystem: Lenovo Thinksystem U.2 P4610 NVMe SSD
   Physical Slot: 64

For U.2 NVMe controller without hotplug capability, this patch is needed
to fix the missing slot info.

Suggested-and-reviewed-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Dong Wang <wangdong28@lenovo.com>
---
v2:
  * Fix the build error for non-x86 arch

---
 arch/x86/pci/common.c  | 21 +++++++++++++++++++++
 drivers/pci/pci-acpi.c |  9 ++++++++-
 include/linux/pci.h    |  1 +
 3 files changed, 30 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas Dec. 1, 2023, 8:02 p.m. UTC | #1
On Tue, Nov 28, 2023 at 08:25:16PM +0800, wangdong28 wrote:
> From: Dong Wang <wangdong28@lenovo.com>
> 
> When enabling VMD function in UEFI setup, the physical slot of the M.2
> NVMe device connected to the VMD device cannot be detected. Here is
> the result from lspci ("Physical Slot" field is NOT shown):

Apparently you're referring to the Physical Slot Number in the Slot
Capabilities register of a VMD Root Port?  That would not be related
to the NVMe device or whatever is *connected* to that Root Port.  It's
important to use the specific names to understand what's happening
here.

>  10001:01:00.0 Non-Volatile memory controller: Intel Corporation NVMe
>  Datacenter SSD [3DNAND, Beta Rock Controller] (prog-if 02 [NVM Express])
>    Subsystem: Intel Corporation NVMe Datacenter SSD [3DNAND] SE M.2 (P4511)
> 
> Generally, the physical slot (/sys/bus/pci/slots) will be created via
> either ACPI walking path during kernel init or hotplug path:
> 
> ACPI walking path:
>   pcibios_add_bus
>     acpi_pci_add_bus
>       acpi_pci_slot_enumerate
>         acpi_walk_namespace
>           register_slot
>             pci_create_slot

IIUC this path registers the slot with a slot number from the ACPI
_SUN method (register_slot() calls check_slot() to evaluate _SUN).

> hotplug path:
>   __pci_hp_initialize
>     pci_create_slot

IIUC this path registers the slot with a slot number from
PCI_EXP_SLTCAP_PSN (see init_slot() in pciehp).

> [M.2 NVMe Device]
> A. VMD disabled
> When VMD is disabled, NVMe will be discovered during bus scanning and
> recognized as acpi device. In this case, the physical slot is created
> via the ACPI walking path.

s/acpi device/ACPI device/

> B. VMD enabled
> vmd_enable_domain() invokes pcibios_add_bus(). This means that it goes
> through the ACPI walking path. However, acpi_pci_add_bus() returns
> directly becase the statment "!ACPI_HANDLE(bus->bridge)" is true.
> See the following code snippet:

s/becase/because/
s/statment/statement/

>   void acpi_pci_add_bus(struct pci_bus *bus)
>   {
>       ...
>       if (acpi_pci_disabled || !bus->bridge || !ACPI_HANDLE(bus->bridge))
> 		return;
>       ...
>   }
> 
> Since VMD creates its own root bus and devices of VMD are attached to
> the bus, those devices are non-ACPI devices. That's why
> "!ACPI_HANDLE(bus->bridge)" returns true.
> 
> In addition, M.2 NVMe devices does not have the hotplug capability.
> Here is the quote from PCI Express M.2 Specification (Revision 5.0,
> Version 1.0):
>
>   CAUTION: M.2 Add-in Cards are not designed or intended to support
>   Hot-Swap or Hot-Plug connections. Performing Hot-Swap or Hot-Plug
>   may pose danger to the M.2 Add-in Card, to the system Platform,
>   and to the person performing this act.

Why do we care about hotplug?  I don't think the Physical Slot Number
depends on hotplug support.

But it does look like we don't look at PCI_EXP_SLTCAP_PSN except in
pciehp, so if a slot doesn't support hotplug, I guess we probably
don't expose the slot in /sys/bus/pci/slots/.  I dunno whether that's
the right thing or not.

I can see that it might be useful to know what physical slot a device
is in even if the slot doesn't support hotplug.  And it seems that's
what you want to do here?  You want to expose the "slot" number of a
particular M.2 connector, even though the socket doesn't support
hotplug?

> M.2 NVMe devices (non-ACPI devices and no hotplug capability) connected
> to the VMD device cannot meet the above-mentioned paths. The corresponding
> slot info of the M.2 NVMe controller cannot be created in
> /sys/bus/pci/slots.
>
> Fix this issue by checking the available physical slot number in
> slot capabilities register. If the physical slot number is available,
> create the slot info accordingly. The following lspci output shows the
> available slot info with applying this patch:

s/physical slot number/Physical Slot Number/
s/slot capabilities/Slot Capabilities/

Capitalize them so we know they refer specifically to things in the
PCIe spec.

>  10001:01:00.0 Non-Volatile memory controller: Intel Corporation NVMe
>  Datacenter SSD [3DNAND, Beta Rock Controller] (prog-if 02 [NVM Express])
>    Subsystem: Intel Corporation NVMe Datacenter SSD [3DNAND] SE M.2 (P4511)
>    Physical Slot: 16
> 
> [U.2 NVMe device]
> A. VMD disabled
> Same as M.2 NVMe Device case "A".
> 
> B. VMD enabled
> Same as M.2 NVMe Device case "B".
> 
> The hotplug of the U.2 device is optional (See "PCI Express SFF-8639 Module
> Specification" for detail). The U.2 NVMe controller with hotplug capability
> connected to the VMD device can meet the hotplug path, so the slot info can
> be shown correctly via the lspci utility (without this patch):
> 
>  10000:82:00.0 Non-Volatile memory controller: Intel Corporation NVMe
>  Datacenter SSD [3DNAND, Beta Rock Controller] (prog-if 02 [NVM Express])
>    Subsystem: Lenovo Thinksystem U.2 P4610 NVMe SSD
>    Physical Slot: 64
> 
> For U.2 NVMe controller without hotplug capability, this patch is needed
> to fix the missing slot info.

So it seems like the question is whether we want to expose the slot
number in *general* (not just for M.2, U.2, etc) even when the slot
does not support hotplug.  The same would apply to normal PCIe slots
in a server or desktop, where you would put a GPU, NIC, etc, etc.

M.2 is a form factor specification that really doesn't have anything
to do with the enumeration and configuration done by software.  As far
as I can tell, there's nothing special about M.2 that justifies this
patch, so while M.2 might be an *example* of a case where this is
useful, it's not the *reason* for making a change like this.

> Suggested-and-reviewed-by: Adrian Huang <ahuang12@lenovo.com>
> Signed-off-by: Dong Wang <wangdong28@lenovo.com>
> ---
> v2:
>   * Fix the build error for non-x86 arch
> 
> ---
>  arch/x86/pci/common.c  | 21 +++++++++++++++++++++
>  drivers/pci/pci-acpi.c |  9 ++++++++-
>  include/linux/pci.h    |  1 +
>  3 files changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index ddb7986..b657b07 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -731,4 +731,25 @@ struct pci_dev *pci_real_dma_dev(struct pci_dev *dev)
>  
>  	return dev;
>  }
> +
> +#define SLOT_NAME_SIZE  5
> +
> +void pci_check_extra_slot_register(struct pci_bus *bus)
> +{
> +	struct pci_dev *pdev = bus->self;
> +	char slot_name[SLOT_NAME_SIZE];
> +	struct pci_slot *pci_slot;
> +	u32 slot_cap, slot_nr;
> +
> +	if (!is_vmd(bus) || !pdev || pcie_capability_read_dword(pdev, PCI_EXP_SLTCAP, &slot_cap))
> +		return;
> +
> +	if (!(slot_cap & PCI_EXP_SLTCAP_HPC)) {
> +		slot_nr = (slot_cap & PCI_EXP_SLTCAP_PSN) >> 19;
> +		snprintf(slot_name, SLOT_NAME_SIZE, "%u", slot_nr);
> +		pci_slot = pci_create_slot(bus, 0, slot_name, NULL);
> +		if (IS_ERR(pci_slot))
> +			pr_err("pci_create_slot returned %ld\n", PTR_ERR(pci_slot));

This patch basically reads the Physical Slot Number in the Slot
Capabilities register in a case where we didn't previously do that.

I guess we're reading PCI_EXP_SLTCAP from a VMD Root Port?  And we
currently don't do that for some reason?  I assume the same exact
problem would occur if that VMD Root Port were connected to an
ordinary PCIe slot?

> +	}
> +}
>  #endif
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index 0045750..e2f2ba8 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -884,6 +884,8 @@ acpi_status pci_acpi_add_pm_notifier(struct acpi_device *dev,
>  	return acpi_add_pm_notifier(dev, &pci_dev->dev, pci_acpi_wake_dev);
>  }
>  
> +void __weak pci_check_extra_slot_register(struct pci_bus *bus) { }
> +
>  /*
>   * _SxD returns the D-state with the highest power
>   * (lowest D-state number) supported in the S-state "x".
> @@ -1202,9 +1204,14 @@ void acpi_pci_add_bus(struct pci_bus *bus)
>  	union acpi_object *obj;
>  	struct pci_host_bridge *bridge;
>  
> -	if (acpi_pci_disabled || !bus->bridge || !ACPI_HANDLE(bus->bridge))
> +	if (acpi_pci_disabled || !bus->bridge)
>  		return;
>  
> +	if (!ACPI_HANDLE(bus->bridge)) {
> +		pci_check_extra_slot_register(bus);
> +		return;
> +	}
> +
>  	acpi_pci_slot_enumerate(bus);
>  	acpiphp_enumerate_slots(bus);
>  
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 60ca768..b9bb447 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1394,6 +1394,7 @@ static inline int pci_rebar_bytes_to_size(u64 bytes)
>  bool pci_device_is_present(struct pci_dev *pdev);
>  void pci_ignore_hotplug(struct pci_dev *dev);
>  struct pci_dev *pci_real_dma_dev(struct pci_dev *dev);
> +void pci_check_extra_slot_register(struct pci_bus *bus);
>  int pci_status_get_and_clear_errors(struct pci_dev *pdev);
>  
>  int __printf(6, 7) pci_request_irq(struct pci_dev *dev, unsigned int nr,
> -- 
> 1.8.3.1
>
diff mbox series

Patch

diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index ddb7986..b657b07 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -731,4 +731,25 @@  struct pci_dev *pci_real_dma_dev(struct pci_dev *dev)
 
 	return dev;
 }
+
+#define SLOT_NAME_SIZE  5
+
+void pci_check_extra_slot_register(struct pci_bus *bus)
+{
+	struct pci_dev *pdev = bus->self;
+	char slot_name[SLOT_NAME_SIZE];
+	struct pci_slot *pci_slot;
+	u32 slot_cap, slot_nr;
+
+	if (!is_vmd(bus) || !pdev || pcie_capability_read_dword(pdev, PCI_EXP_SLTCAP, &slot_cap))
+		return;
+
+	if (!(slot_cap & PCI_EXP_SLTCAP_HPC)) {
+		slot_nr = (slot_cap & PCI_EXP_SLTCAP_PSN) >> 19;
+		snprintf(slot_name, SLOT_NAME_SIZE, "%u", slot_nr);
+		pci_slot = pci_create_slot(bus, 0, slot_name, NULL);
+		if (IS_ERR(pci_slot))
+			pr_err("pci_create_slot returned %ld\n", PTR_ERR(pci_slot));
+	}
+}
 #endif
diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index 0045750..e2f2ba8 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -884,6 +884,8 @@  acpi_status pci_acpi_add_pm_notifier(struct acpi_device *dev,
 	return acpi_add_pm_notifier(dev, &pci_dev->dev, pci_acpi_wake_dev);
 }
 
+void __weak pci_check_extra_slot_register(struct pci_bus *bus) { }
+
 /*
  * _SxD returns the D-state with the highest power
  * (lowest D-state number) supported in the S-state "x".
@@ -1202,9 +1204,14 @@  void acpi_pci_add_bus(struct pci_bus *bus)
 	union acpi_object *obj;
 	struct pci_host_bridge *bridge;
 
-	if (acpi_pci_disabled || !bus->bridge || !ACPI_HANDLE(bus->bridge))
+	if (acpi_pci_disabled || !bus->bridge)
 		return;
 
+	if (!ACPI_HANDLE(bus->bridge)) {
+		pci_check_extra_slot_register(bus);
+		return;
+	}
+
 	acpi_pci_slot_enumerate(bus);
 	acpiphp_enumerate_slots(bus);
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 60ca768..b9bb447 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1394,6 +1394,7 @@  static inline int pci_rebar_bytes_to_size(u64 bytes)
 bool pci_device_is_present(struct pci_dev *pdev);
 void pci_ignore_hotplug(struct pci_dev *dev);
 struct pci_dev *pci_real_dma_dev(struct pci_dev *dev);
+void pci_check_extra_slot_register(struct pci_bus *bus);
 int pci_status_get_and_clear_errors(struct pci_dev *pdev);
 
 int __printf(6, 7) pci_request_irq(struct pci_dev *dev, unsigned int nr,