Message ID | 20250404-pcie-reset-slot-v1-0-98952918bf90@linaro.org |
---|---|
Headers | show |
Series | PCI: Add support for resetting the slots in a platform specific way | expand |
On Fri, Apr 04, 2025 at 01:52:22PM +0530, Manivannan Sadhasivam via B4 Relay wrote: > When the PCI error handling requires resetting the slot, reset it using the > host bridge specific 'reset_slot' callback if available before calling the > 'slot_reset' callback of the PCI drivers. > > The 'reset_slot' callback is responsible for resetting the given slot > referenced by the 'pci_dev' pointer in a platform specific way and bring it > back to the working state if possible. If any error occurs during the slot > reset operation, relevant errno should be returned. [...] > --- a/drivers/pci/pcie/err.c > +++ b/drivers/pci/pcie/err.c > @@ -234,11 +234,16 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > } > > if (status == PCI_ERS_RESULT_NEED_RESET) { > - /* > - * TODO: Should call platform-specific > - * functions to reset slot before calling > - * drivers' slot_reset callbacks? > - */ > + if (host->reset_slot) { > + ret = host->reset_slot(host, bridge); > + if (ret) { > + pci_err(bridge, "failed to reset slot: %d\n", > + ret); > + status = PCI_ERS_RESULT_DISCONNECT; > + goto failed; > + } > + } > + This feels like something that should be plumbed into pcibios_reset_secondary_bus(), rather than pcie_do_recovery(). Note that in the DPC case, pcie_do_recovery() doesn't issue a reset itself. The reset has already happened, it was automatically done by the hardware and all the kernel needs to do is bring up the link again. Do you really need any special handling for that in the host controller driver? Only in the AER case do you want to issue a reset on the secondary bus and if there's any platform-specific support needed for that, it needs to go into pcibios_reset_secondary_bus(). Thanks, Lukas
On Fri, Apr 04, 2025 at 10:46:27AM +0200, Lukas Wunner wrote: > On Fri, Apr 04, 2025 at 01:52:22PM +0530, Manivannan Sadhasivam via B4 Relay wrote: > > When the PCI error handling requires resetting the slot, reset it using the > > host bridge specific 'reset_slot' callback if available before calling the > > 'slot_reset' callback of the PCI drivers. > > > > The 'reset_slot' callback is responsible for resetting the given slot > > referenced by the 'pci_dev' pointer in a platform specific way and bring it > > back to the working state if possible. If any error occurs during the slot > > reset operation, relevant errno should be returned. > [...] > > --- a/drivers/pci/pcie/err.c > > +++ b/drivers/pci/pcie/err.c > > @@ -234,11 +234,16 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > > } > > > > if (status == PCI_ERS_RESULT_NEED_RESET) { > > - /* > > - * TODO: Should call platform-specific > > - * functions to reset slot before calling > > - * drivers' slot_reset callbacks? > > - */ > > + if (host->reset_slot) { > > + ret = host->reset_slot(host, bridge); > > + if (ret) { > > + pci_err(bridge, "failed to reset slot: %d\n", > > + ret); > > + status = PCI_ERS_RESULT_DISCONNECT; > > + goto failed; > > + } > > + } > > + > > This feels like something that should be plumbed into > pcibios_reset_secondary_bus(), rather than pcie_do_recovery(). > I did consider that, but didn't go for it since there was no platform reset code present in that function. But I will try to use it as I don't have a strong preference to do reset slot in pcie_do_recovery(). > Note that in the DPC case, pcie_do_recovery() doesn't issue a reset > itself. The reset has already happened, it was automatically done > by the hardware and all the kernel needs to do is bring up the link > again. Do you really need any special handling for that in the > host controller driver? > I haven't tested DPC, so I'm not sure if reset slot is needed or not. > Only in the AER case do you want to issue a reset on the secondary bus > and if there's any platform-specific support needed for that, it needs > to go into pcibios_reset_secondary_bus(). > Ok. I'm trying out this right now and will see if it satisfies my requirement (for both AER fatal and Link Down recovery). - Mani
On Tue, Apr 15, 2025 at 07:03:17PM +0530, Manivannan Sadhasivam wrote: > On Fri, Apr 04, 2025 at 10:46:27AM +0200, Lukas Wunner wrote: > > On Fri, Apr 04, 2025 at 01:52:22PM +0530, Manivannan Sadhasivam via B4 Relay wrote: > > > When the PCI error handling requires resetting the slot, reset it > > > using the host bridge specific 'reset_slot' callback if available > > > before calling the 'slot_reset' callback of the PCI drivers. > > > > > > The 'reset_slot' callback is responsible for resetting the given slot > > > referenced by the 'pci_dev' pointer in a platform specific way and > > > bring it back to the working state if possible. If any error occurs > > > during the slot reset operation, relevant errno should be returned. > > > > This feels like something that should be plumbed into > > pcibios_reset_secondary_bus(), rather than pcie_do_recovery(). > > I did consider that, but didn't go for it since there was no platform > reset code present in that function. But I will try to use it as I > don't have a strong preference to do reset slot in pcie_do_recovery(). The only platform overriding pcibios_reset_secondary_bus() is powerpc, and it only does that on PowerNV. I think you could continue to stick with the approach of adding a ->reset_slot() callback to struct pci_host_bridge, but it would be good if at the same time you could convert PowerNV to use the newly introduced callback as well. And then remove the way to override the reset procedure via pcibios_reset_secondary_bus(). All pci_host_bridge's which do not define a ->reset_slot() could be assigned a default callback which just calls pci_reset_secondary_bus(). Alternatively, pcibios_reset_secondary_bus() could be made to invoke the pci_host_bridge's ->reset_slot() callback if it's not NULL, else pci_reset_secondary_bus(). And in that case, the __weak attribute could be removed as well as the powerpc-specific version of pcibios_reset_secondary_bus(). I guess what I'm trying to say is, you don't *have* to plumb this into pcibios_reset_secondary_bus(). In fact, having a host bridge specific callback could be useful if the SoC contains several host bridges which require different callbacks to perform a reset. I just want to make sure that the code remains maintainable, i.e. we don't have two separate ways to override how a bus reset is performed. Thanks, Lukas
On Wed, Apr 16, 2025 at 04:38:01PM +0200, Lukas Wunner wrote: > On Tue, Apr 15, 2025 at 07:03:17PM +0530, Manivannan Sadhasivam wrote: > > On Fri, Apr 04, 2025 at 10:46:27AM +0200, Lukas Wunner wrote: > > > On Fri, Apr 04, 2025 at 01:52:22PM +0530, Manivannan Sadhasivam via B4 Relay wrote: > > > > When the PCI error handling requires resetting the slot, reset it > > > > using the host bridge specific 'reset_slot' callback if available > > > > before calling the 'slot_reset' callback of the PCI drivers. > > > > > > > > The 'reset_slot' callback is responsible for resetting the given slot > > > > referenced by the 'pci_dev' pointer in a platform specific way and > > > > bring it back to the working state if possible. If any error occurs > > > > during the slot reset operation, relevant errno should be returned. > > > > > > This feels like something that should be plumbed into > > > pcibios_reset_secondary_bus(), rather than pcie_do_recovery(). > > > > I did consider that, but didn't go for it since there was no platform > > reset code present in that function. But I will try to use it as I > > don't have a strong preference to do reset slot in pcie_do_recovery(). > > The only platform overriding pcibios_reset_secondary_bus() is powerpc, > and it only does that on PowerNV. > > I think you could continue to stick with the approach of adding a > ->reset_slot() callback to struct pci_host_bridge, but it would > be good if at the same time you could convert PowerNV to use the > newly introduced callback as well. And then remove the way to > override the reset procedure via pcibios_reset_secondary_bus(). > > All pci_host_bridge's which do not define a ->reset_slot() could be > assigned a default callback which just calls pci_reset_secondary_bus(). > > Alternatively, pcibios_reset_secondary_bus() could be made to invoke the > pci_host_bridge's ->reset_slot() callback if it's not NULL, else > pci_reset_secondary_bus(). Yes. This is what I now tried locally as well. > And in that case, the __weak attribute > could be removed as well as the powerpc-specific version of > pcibios_reset_secondary_bus(). > I don't think it is possible to get rid of the powerpc version. It has its own pci_dev::sysdata pointing to 'struct pci_controller' pointer which is internal to powerpc arch code. And the generic code would need 'struct pci_host_bridge' to access the callback. > I guess what I'm trying to say is, you don't *have* to plumb this > into pcibios_reset_secondary_bus(). In fact, having a host bridge > specific callback could be useful if the SoC contains several > host bridges which require different callbacks to perform a reset. > > I just want to make sure that the code remains maintainable, > i.e. we don't have two separate ways to override how a bus reset > is performed. > I think we need to have the override in powerpc anyway. But I will move the 'reset_slot' callback to it and get it called from pci_bus_error_reset() (for both AER and Link Down). - Mani
Hi, Currently, in the event of AER/DPC, PCI core will try to reset the slot and its subordinate devices by invoking bridge control reset and FLR. But in some cases like AER Fatal error, it might be necessary to reset the slots using the PCI host bridge drivers in a platform specific way (as indicated by the TODO in the pcie_do_recovery() function in drivers/pci/pcie/err.c). Otherwise, the PCI link won't be recovered successfully. So this series adds a new callback 'pci_host_bridge::reset_slot' for the host bridge drivers to reset the slot when a fatal error happens. Also, this series allows the host bridge drivers to handle PCI link down event by resetting the slots and recovering the bus. This is accomplished by the help of a new API 'pci_host_handle_link_down()'. Host bridge drivers are expected to call this API (preferrably from a threaded IRQ handler) when a link down event is detected. The API will reuse the pcie_do_recovery() function to recover the link if AER support is enabled, otherwise it will directly call the reset_slot() callback of the host bridge driver (if exists). For reference, I've modified the pcie-qcom driver to call pci_host_handle_link_down() after receiving LINK_DOWN global_irq event and populated the 'pci_host_bridge::reset_slot()' callback to reset the controller (there by slots). Since the Qcom PCIe controllers support only a single root port (slot) per controller instance, reset_slot() callback is going to be invoked only once. For multi root port controllers, this callback is supposed to identify the slots using the supplied 'pci_dev' pointer and reset them. NOTE ==== This series is a reworked version of the earlier series [1] that I submitted for handling PCI link down event. In this series, I've made use of the AER helpers to recover the link as it allows notifying the device drivers and also allows saving/restoring the config space. Testing ======= This series is tested on Qcom RB5 and SA8775p Ride boards by triggering the link down event manually by writing to LTSSM register. For the error recovery to succeed (if AER is enabled), all the drivers in the bridge hierarchy should have the 'err_handlers' populated. Otherwise, the link recovery will fail. [1] https://lore.kernel.org/linux-pci/20250221172309.120009-1-manivannan.sadhasivam@linaro.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> --- Manivannan Sadhasivam (4): PCI/ERR: Remove misleading TODO regarding kernel panic PCI/ERR: Add support for resetting the slot in a platforms specific way PCI: Add link down handling for host bridges PCI: qcom: Add support for resetting the slot due to link down event drivers/pci/controller/dwc/pcie-qcom.c | 89 +++++++++++++++++++++++++++++++++- drivers/pci/pci.h | 22 +++++++++ drivers/pci/pcie/err.c | 29 ++++++++--- drivers/pci/probe.c | 7 +++ include/linux/pci.h | 2 + 5 files changed, 140 insertions(+), 9 deletions(-) --- base-commit: 08733088b566b58283f0f12fb73f5db6a9a9de30 change-id: 20250404-pcie-reset-slot-730bfa71a202 Best regards,