Message ID | 20240401-pci-epf-rework-v2-10-970dbe90b99d@linaro.org |
---|---|
State | New |
Headers | show |
Series | PCI: endpoint: Make host reboot handling more robust | expand |
On Wed, Apr 03, 2024 at 07:02:17PM +0530, Manivannan Sadhasivam wrote: > On Tue, Apr 02, 2024 at 01:18:54PM +0200, Niklas Cassel wrote: > > On Mon, Apr 01, 2024 at 09:20:36PM +0530, Manivannan Sadhasivam wrote: > > > PCIe host controller drivers are supposed to properly reset the endpoint > > > devices during host shutdown/reboot. Currently, Qcom driver doesn't do > > > anything during host shutdown/reboot, resulting in both PERST# and refclk > > > getting disabled at the same time. This prevents the endpoint device > > > firmware to properly reset the state machine. Because, if the refclk is > > > cutoff immediately along with PERST#, access to device specific registers > > > within the endpoint will result in a firmware crash. > > > > > > To address this issue, let's call qcom_pcie_host_deinit() inside the > > > shutdown callback, that asserts PERST# and then cuts off the refclk with a > > > delay of 1ms, thus allowing the endpoint device firmware to properly > > > cleanup the state machine. > > > > Hm... a QCOM EP device could be attached to any of the PCIe RC drivers that > > we have in the kernel, so it seems a bit weird to fix this problem by > > patching the QCOM RC driver only. > > > > Which DBI call is it that causes this problem during perst assert on EP side? > > > > I assume that it is pci-epf-test:deinit() callback that calls > > pci_epc_clear_bar(), which calls dw_pcie_ep_clear_bar(), which will both: > > -clear local data structures, e.g. > > ep->epf_bar[bar] = NULL; > > ep->bar_to_atu[bar] = 0; > > > > but also call: > > __dw_pcie_ep_reset_bar() > > dw_pcie_disable_atu() > > > > > > Do we perhaps need to redesign the .deinit EPF callback? > > > > Considering that we know that .deinit() will only be called on platforms > > where there will be a fundamental core reset, I guess we could do something > > like introduce a __dw_pcie_ep_clear_bar() which will only clear the local > > data structures. (It might not need to do any DBI writes, since the > > fundamental core reset should have reset all values.) > > > > Or perhaps instead of letting pci_epf_test_epc_deinit() call > > pci_epf_test_clear_bar()/__pci_epf_test_clear_bar() directly, perhaps let > > pci_epf_test_epc_deinit() call add a .deinit()/.cleanup() defined in the > > EPC driver. > > > > This EPC .deinit()/.cleanup() callback would then only clear the > > local data structures (no DBI writes...). > > > > Something like that? > > > > It is not just about the EPF test driver. A function driver may need to do many > things to properly reset the state machine. Like in the case of MHI driver, it > needs to reset channel state, mask interrupts etc... and all requires writing to > some registers. So certainly there should be some time before cutting off the > refclk. I was more thinking that perhaps we should think of .deinit() as in how dw_pcie_ep_init() used to be. It was not allowed to have any DBI writes. (The DBI writes were all in dw_pcie_ep_init_complete()). So perhaps we could define that a EPF .deinit() callback is not allowed to have any DBI writes. If we take qcom-ep as an example, as soon as you get a PERST assertion the qcom-ep driver calls notify_deinit(), then asserts the reset control, disables clocks and regulators. Since the PCIe core is held in reset, the hardware is in a well defined state, no? Sure, the data structures e.g. bar_to_iatu[], etc., might be out of sync, but these could be memset():ed no? Since this is a fundamental reset, all registers should be reset to their default state (once reset is deasserted). For a real PCIe card, if you assert + msleep(100) + deassert PERST, surely the endpoint is supposed to be in a good/well defined state, regardless if he REFCLK was cutoff at the exact time as PERST was asserted or not? I would assume that we would want a PCI EPF driver to behave the same way, if possible. Kind regards, Niklas
On Wed, Apr 03, 2024 at 10:03:26PM +0200, Niklas Cassel wrote: > On Wed, Apr 03, 2024 at 07:02:17PM +0530, Manivannan Sadhasivam wrote: > > On Tue, Apr 02, 2024 at 01:18:54PM +0200, Niklas Cassel wrote: > > > On Mon, Apr 01, 2024 at 09:20:36PM +0530, Manivannan Sadhasivam wrote: > > > > PCIe host controller drivers are supposed to properly reset the endpoint > > > > devices during host shutdown/reboot. Currently, Qcom driver doesn't do > > > > anything during host shutdown/reboot, resulting in both PERST# and refclk > > > > getting disabled at the same time. This prevents the endpoint device > > > > firmware to properly reset the state machine. Because, if the refclk is > > > > cutoff immediately along with PERST#, access to device specific registers > > > > within the endpoint will result in a firmware crash. > > > > > > > > To address this issue, let's call qcom_pcie_host_deinit() inside the > > > > shutdown callback, that asserts PERST# and then cuts off the refclk with a > > > > delay of 1ms, thus allowing the endpoint device firmware to properly > > > > cleanup the state machine. > ... > For a real PCIe card, if you assert + msleep(100) + deassert PERST, surely > the endpoint is supposed to be in a good/well defined state, regardless if > he REFCLK was cutoff at the exact time as PERST was asserted or not? I think this is the key point. This PERST#/REFCLK timing requirement seems like a defect in the endpoint. I know there's firmware involved and whatnot, but IMO it's the endpoint's problem to handle these issues internally. Bjorn
On Wed, Apr 03, 2024 at 10:03:26PM +0200, Niklas Cassel wrote: > On Wed, Apr 03, 2024 at 07:02:17PM +0530, Manivannan Sadhasivam wrote: > > On Tue, Apr 02, 2024 at 01:18:54PM +0200, Niklas Cassel wrote: > > > On Mon, Apr 01, 2024 at 09:20:36PM +0530, Manivannan Sadhasivam wrote: > > > > PCIe host controller drivers are supposed to properly reset the endpoint > > > > devices during host shutdown/reboot. Currently, Qcom driver doesn't do > > > > anything during host shutdown/reboot, resulting in both PERST# and refclk > > > > getting disabled at the same time. This prevents the endpoint device > > > > firmware to properly reset the state machine. Because, if the refclk is > > > > cutoff immediately along with PERST#, access to device specific registers > > > > within the endpoint will result in a firmware crash. > > > > > > > > To address this issue, let's call qcom_pcie_host_deinit() inside the > > > > shutdown callback, that asserts PERST# and then cuts off the refclk with a > > > > delay of 1ms, thus allowing the endpoint device firmware to properly > > > > cleanup the state machine. > > > > > > Hm... a QCOM EP device could be attached to any of the PCIe RC drivers that > > > we have in the kernel, so it seems a bit weird to fix this problem by > > > patching the QCOM RC driver only. > > > > > > Which DBI call is it that causes this problem during perst assert on EP side? > > > > > > I assume that it is pci-epf-test:deinit() callback that calls > > > pci_epc_clear_bar(), which calls dw_pcie_ep_clear_bar(), which will both: > > > -clear local data structures, e.g. > > > ep->epf_bar[bar] = NULL; > > > ep->bar_to_atu[bar] = 0; > > > > > > but also call: > > > __dw_pcie_ep_reset_bar() > > > dw_pcie_disable_atu() > > > > > > > > > Do we perhaps need to redesign the .deinit EPF callback? > > > > > > Considering that we know that .deinit() will only be called on platforms > > > where there will be a fundamental core reset, I guess we could do something > > > like introduce a __dw_pcie_ep_clear_bar() which will only clear the local > > > data structures. (It might not need to do any DBI writes, since the > > > fundamental core reset should have reset all values.) > > > > > > Or perhaps instead of letting pci_epf_test_epc_deinit() call > > > pci_epf_test_clear_bar()/__pci_epf_test_clear_bar() directly, perhaps let > > > pci_epf_test_epc_deinit() call add a .deinit()/.cleanup() defined in the > > > EPC driver. > > > > > > This EPC .deinit()/.cleanup() callback would then only clear the > > > local data structures (no DBI writes...). > > > > > > Something like that? > > > > > > > It is not just about the EPF test driver. A function driver may need to do many > > things to properly reset the state machine. Like in the case of MHI driver, it > > needs to reset channel state, mask interrupts etc... and all requires writing to > > some registers. So certainly there should be some time before cutting off the > > refclk. > > I was more thinking that perhaps we should think of .deinit() as in how > dw_pcie_ep_init() used to be. It was not allowed to have any DBI writes. > (The DBI writes were all in dw_pcie_ep_init_complete()). > So perhaps we could define that a EPF .deinit() callback is not allowed > to have any DBI writes. > > If we take qcom-ep as an example, as soon as you get a PERST assertion > the qcom-ep driver calls notify_deinit(), then asserts the reset control, > disables clocks and regulators. > > Since the PCIe core is held in reset, the hardware is in a well defined > state, no? Sure, the data structures e.g. bar_to_iatu[], etc., might be > out of sync, but these could be memset():ed no? Since this is a fundamental > reset, all registers should be reset to their default state (once reset > is deasserted). > Well, we could prevent the register access during PERST# assert time in the endpoint, but my worry is that we will end up with 2 version of the cleanup APIs. Lets take an example of dw_pcie_edma_remove() API which gets called during deinit and it touches some eDMA registers. So should we introduce another API which just clears the sw data structure and not touching the registers? And this may be needed for other generic APIs as well. Ideally, if there is a Link Down event before PERST# assert, then this could've been solved, but that is also not a spec defined behavior. - Mani > For a real PCIe card, if you assert + msleep(100) + deassert PERST, surely > the endpoint is supposed to be in a good/well defined state, regardless if > he REFCLK was cutoff at the exact time as PERST was asserted or not? > > I would assume that we would want a PCI EPF driver to behave the same way, > if possible. > > > Kind regards, > Niklas
On Wed, Apr 10, 2024 at 04:24:10PM +0530, Manivannan Sadhasivam wrote: > On Wed, Apr 03, 2024 at 10:03:26PM +0200, Niklas Cassel wrote: > > On Wed, Apr 03, 2024 at 07:02:17PM +0530, Manivannan Sadhasivam wrote: > > > On Tue, Apr 02, 2024 at 01:18:54PM +0200, Niklas Cassel wrote: > > > > On Mon, Apr 01, 2024 at 09:20:36PM +0530, Manivannan Sadhasivam wrote: > > > > > PCIe host controller drivers are supposed to properly reset the endpoint > > > > > devices during host shutdown/reboot. Currently, Qcom driver doesn't do > > > > > anything during host shutdown/reboot, resulting in both PERST# and refclk > > > > > getting disabled at the same time. This prevents the endpoint device > > > > > firmware to properly reset the state machine. Because, if the refclk is > > > > > cutoff immediately along with PERST#, access to device specific registers > > > > > within the endpoint will result in a firmware crash. > > > > > > > > > > To address this issue, let's call qcom_pcie_host_deinit() inside the > > > > > shutdown callback, that asserts PERST# and then cuts off the refclk with a > > > > > delay of 1ms, thus allowing the endpoint device firmware to properly > > > > > cleanup the state machine. > > > > > > > > Hm... a QCOM EP device could be attached to any of the PCIe RC drivers that > > > > we have in the kernel, so it seems a bit weird to fix this problem by > > > > patching the QCOM RC driver only. > > > > > > > > Which DBI call is it that causes this problem during perst assert on EP side? > > > > > > > > I assume that it is pci-epf-test:deinit() callback that calls > > > > pci_epc_clear_bar(), which calls dw_pcie_ep_clear_bar(), which will both: > > > > -clear local data structures, e.g. > > > > ep->epf_bar[bar] = NULL; > > > > ep->bar_to_atu[bar] = 0; > > > > > > > > but also call: > > > > __dw_pcie_ep_reset_bar() > > > > dw_pcie_disable_atu() > > > > > > > > > > > > Do we perhaps need to redesign the .deinit EPF callback? > > > > > > > > Considering that we know that .deinit() will only be called on platforms > > > > where there will be a fundamental core reset, I guess we could do something > > > > like introduce a __dw_pcie_ep_clear_bar() which will only clear the local > > > > data structures. (It might not need to do any DBI writes, since the > > > > fundamental core reset should have reset all values.) > > > > > > > > Or perhaps instead of letting pci_epf_test_epc_deinit() call > > > > pci_epf_test_clear_bar()/__pci_epf_test_clear_bar() directly, perhaps let > > > > pci_epf_test_epc_deinit() call add a .deinit()/.cleanup() defined in the > > > > EPC driver. > > > > > > > > This EPC .deinit()/.cleanup() callback would then only clear the > > > > local data structures (no DBI writes...). > > > > > > > > Something like that? > > > > > > > > > > It is not just about the EPF test driver. A function driver may need to do many > > > things to properly reset the state machine. Like in the case of MHI driver, it > > > needs to reset channel state, mask interrupts etc... and all requires writing to > > > some registers. So certainly there should be some time before cutting off the > > > refclk. > > > > I was more thinking that perhaps we should think of .deinit() as in how > > dw_pcie_ep_init() used to be. It was not allowed to have any DBI writes. > > (The DBI writes were all in dw_pcie_ep_init_complete()). > > So perhaps we could define that a EPF .deinit() callback is not allowed > > to have any DBI writes. > > > > If we take qcom-ep as an example, as soon as you get a PERST assertion > > the qcom-ep driver calls notify_deinit(), then asserts the reset control, > > disables clocks and regulators. > > > > Since the PCIe core is held in reset, the hardware is in a well defined > > state, no? Sure, the data structures e.g. bar_to_iatu[], etc., might be > > out of sync, but these could be memset():ed no? Since this is a fundamental > > reset, all registers should be reset to their default state (once reset > > is deasserted). > > > > Well, we could prevent the register access during PERST# assert time in the > endpoint, but my worry is that we will end up with 2 version of the cleanup > APIs. Lets take an example of dw_pcie_edma_remove() API which gets called > during deinit and it touches some eDMA registers. > > So should we introduce another API which just clears the sw data structure and > not touching the registers? And this may be needed for other generic APIs as > well. I agree that it will be hard to come up with an elegant solution to this problem. These endpoint controllers that cannot do register accesses to their own controllers' DBI/register space without the RC providing a refclock are really becoming a pain... and the design and complexity of the PCI endpoint APIs is what suffers as a result. PERST could be asserted at any time. So for your system to not crash/hang by accessing registers in the controller, an EPF driver must be designed with great care to never do any register access when it is not safe... Perhaps the the EPF core should set the state (i.e. init_complete = false, even before calling the deinit callback in EPF driver, and perhaps even add safe-guards against init_complete in some APIs, so that e.g. a set_bar() call cannot trigger a crash because PERST# is asserted.. but even then, it could still be asserted in the middle of set_bar()'s execution.) Looking at the databook, it looks like core_clk is derived from pipe_clk output of the PHY. The PHY will either use external clock or internal clock. 4.6.2 DBI Protocol Transactions it looks like core_clk must be active to read/write the DBI. I really wish those controllers could e.g. change the clock temporarily using a mux, so that it could still perform DBI read/writes when there is not external refclk... Something like pm_sel_aux_clk selecting to use the aux clk instead of core_clk when in low power states. But I don't know the hardware well enough to know if that is possible for the DBI, so that might just be wishful thinking... Kind regards, Niklas
On Wed, Apr 10, 2024 at 04:52:18PM +0200, Niklas Cassel wrote: > On Wed, Apr 10, 2024 at 04:24:10PM +0530, Manivannan Sadhasivam wrote: > > > > Well, we could prevent the register access during PERST# assert time in the > > endpoint, but my worry is that we will end up with 2 version of the cleanup > > APIs. Lets take an example of dw_pcie_edma_remove() API which gets called > > during deinit and it touches some eDMA registers. > > > > So should we introduce another API which just clears the sw data structure and > > not touching the registers? And this may be needed for other generic APIs as > > well. > > I agree that it will be hard to come up with an elegant solution to this > problem. > > These endpoint controllers that cannot do register accesses to their own > controllers' DBI/register space without the RC providing a refclock are > really becoming a pain... and the design and complexity of the PCI endpoint > APIs is what suffers as a result. > > PERST could be asserted at any time. > So for your system to not crash/hang by accessing registers in the controller, > an EPF driver must be designed with great care to never do any register access > when it is not safe... > > Perhaps the the EPF core should set the state (i.e. init_complete = false, > even before calling the deinit callback in EPF driver, and perhaps even add > safe-guards against init_complete in some APIs, so that e.g. a set_bar() call > cannot trigger a crash because PERST# is asserted.. but even then, it could > still be asserted in the middle of set_bar()'s execution.) > > > Looking at the databook, it looks like core_clk is derived from pipe_clk > output of the PHY. The PHY will either use external clock or internal clock. > > 4.6.2 DBI Protocol Transactions > it looks like core_clk must be active to read/write the DBI. > > I really wish those controllers could e.g. change the clock temporarily > using a mux, so that it could still perform DBI read/writes when there is > not external refclk... Something like pm_sel_aux_clk selecting to use the > aux clk instead of core_clk when in low power states. > But I don't know the hardware well enough to know if that is possible for > the DBI, so that might just be wishful thinking... Looking at the rock5b SBC (rockchip rk3588), the PHY refclk can either be taken from -a PLL internally from the SoC. or -an external clock on the SBC. There does not seem to be an option to get the refclk as an input from the PCIe slot. (The refclk can only be output to the PCIe slot.) So when running two rock5b SBC, you cannot use a common clock for the RC and the EP side, you have to use a separate reference clock scheme, either SRNS or SRIS. Since I assume that you use two qcom platforms of the same model (I remember that you wrote that you usually test with qcom,sdx55-pcie-ep somewhere.) Surely this board must be able to supply a reference clock? (How else does it send this clock to the EP side?) So... why can't you run in SRNS or SRIS mode, where the EP provides it's own clock? Kind regards, Niklas
On Wed, Apr 10, 2024 at 11:10:01PM +0200, Niklas Cassel wrote: > On Wed, Apr 10, 2024 at 04:52:18PM +0200, Niklas Cassel wrote: > > On Wed, Apr 10, 2024 at 04:24:10PM +0530, Manivannan Sadhasivam wrote: > > > > > > Well, we could prevent the register access during PERST# assert time in the > > > endpoint, but my worry is that we will end up with 2 version of the cleanup > > > APIs. Lets take an example of dw_pcie_edma_remove() API which gets called > > > during deinit and it touches some eDMA registers. > > > > > > So should we introduce another API which just clears the sw data structure and > > > not touching the registers? And this may be needed for other generic APIs as > > > well. > > > > I agree that it will be hard to come up with an elegant solution to this > > problem. > > > > These endpoint controllers that cannot do register accesses to their own > > controllers' DBI/register space without the RC providing a refclock are > > really becoming a pain... and the design and complexity of the PCI endpoint > > APIs is what suffers as a result. > > > > PERST could be asserted at any time. > > So for your system to not crash/hang by accessing registers in the controller, > > an EPF driver must be designed with great care to never do any register access > > when it is not safe... > > > > Perhaps the the EPF core should set the state (i.e. init_complete = false, > > even before calling the deinit callback in EPF driver, and perhaps even add > > safe-guards against init_complete in some APIs, so that e.g. a set_bar() call > > cannot trigger a crash because PERST# is asserted.. but even then, it could > > still be asserted in the middle of set_bar()'s execution.) > > > > > > Looking at the databook, it looks like core_clk is derived from pipe_clk > > output of the PHY. The PHY will either use external clock or internal clock. > > > > 4.6.2 DBI Protocol Transactions > > it looks like core_clk must be active to read/write the DBI. > > > > I really wish those controllers could e.g. change the clock temporarily > > using a mux, so that it could still perform DBI read/writes when there is > > not external refclk... Something like pm_sel_aux_clk selecting to use the > > aux clk instead of core_clk when in low power states. > > But I don't know the hardware well enough to know if that is possible for > > the DBI, so that might just be wishful thinking... > > > Looking at the rock5b SBC (rockchip rk3588), the PHY refclk can either > be taken from > -a PLL internally from the SoC. > or > -an external clock on the SBC. > > There does not seem to be an option to get the refclk as an input from > the PCIe slot. (The refclk can only be output to the PCIe slot.) > > So when running two rock5b SBC, you cannot use a common clock for the RC > and the EP side, you have to use a separate reference clock scheme, > either SRNS or SRIS. > > Since I assume that you use two qcom platforms of the same model > (I remember that you wrote that you usually test with > qcom,sdx55-pcie-ep somewhere.) > Surely this board must be able to supply a reference clock? > (How else does it send this clock to the EP side?) > It is not the same model. It is the same SoC but with different base boards. FWIW, I'm testing with both SDX55 and SM8450 SoC based boards. So the Qcom EP has no way to generate refclk internally (I double checked this) and it has to rely on refclk from either host or a separate clock source. But in most of the board designs they connect to host refclk by default. > So... why can't you run in SRNS or SRIS mode, where the EP provides > it's own clock? > As per the discussion I had with Qcom PCIe team, they that said SRIS is only available at the host side to supply independent clock to the EP. But that requires changes to the PHY sequence and could also result in preformance drop. Also this mode is available only on new SoCs but not on older ones like SDX55. So I don't think this is a viable option. - Mani
diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c index 14772edcf0d3..b2803978c0ad 100644 --- a/drivers/pci/controller/dwc/pcie-qcom.c +++ b/drivers/pci/controller/dwc/pcie-qcom.c @@ -1655,6 +1655,13 @@ static int qcom_pcie_resume_noirq(struct device *dev) return 0; } +static void qcom_pcie_shutdown(struct platform_device *pdev) +{ + struct qcom_pcie *pcie = platform_get_drvdata(pdev); + + qcom_pcie_host_deinit(&pcie->pci->pp); +} + static const struct of_device_id qcom_pcie_match[] = { { .compatible = "qcom,pcie-apq8064", .data = &cfg_2_1_0 }, { .compatible = "qcom,pcie-apq8084", .data = &cfg_1_0_0 }, @@ -1708,5 +1715,6 @@ static struct platform_driver qcom_pcie_driver = { .pm = &qcom_pcie_pm_ops, .probe_type = PROBE_PREFER_ASYNCHRONOUS, }, + .shutdown = qcom_pcie_shutdown, }; builtin_platform_driver(qcom_pcie_driver);
PCIe host controller drivers are supposed to properly reset the endpoint devices during host shutdown/reboot. Currently, Qcom driver doesn't do anything during host shutdown/reboot, resulting in both PERST# and refclk getting disabled at the same time. This prevents the endpoint device firmware to properly reset the state machine. Because, if the refclk is cutoff immediately along with PERST#, access to device specific registers within the endpoint will result in a firmware crash. To address this issue, let's call qcom_pcie_host_deinit() inside the shutdown callback, that asserts PERST# and then cuts off the refclk with a delay of 1ms, thus allowing the endpoint device firmware to properly cleanup the state machine. Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> --- drivers/pci/controller/dwc/pcie-qcom.c | 8 ++++++++ 1 file changed, 8 insertions(+)