Message ID | 20230201101559.15529-5-johan+linaro@kernel.org |
---|---|
State | Superseded |
Headers | show |
Series | interconnect: fix racy provider registration | expand |
Hello Johan, On Wed, 1 Feb 2023 11:15:40 +0100 Johan Hovold <johan+linaro@kernel.org> wrote: > The current interconnect provider registration interface is inherently > racy as nodes are not added until the after adding the provider. This > can specifically cause racing DT lookups to fail. > > Switch to using the new API where the provider is not registered until > after it has been fully initialised. > > Fixes: f0d8048525d7 ("interconnect: Add imx core driver") > Cc: stable@vger.kernel.org # 5.8 > Cc: Leonard Crestez <leonard.crestez@nxp.com> > Cc: Alexandre Bailon <abailon@baylibre.com> > Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Georgi pointed me to this series after I reported a bug yesterday [0], that I found on iMX8MP. So I ran some tests with my original, failing tree, minus one patch with my debugging code to hunt for the bug, plus patches 1-4 of this series. The original code was failing approx 5~10% of the times. With your 4 patches applied it ran 139 times with zero errors, which looks great! I won't be able to do more testing until next Monday to be extra sure. [0] https://lore.kernel.org/linux-arm-kernel/20230202175525.3dba79a7@booty/T/#u
On Fri, Feb 03, 2023 at 05:01:21PM +0100, Luca Ceresoli wrote: > Hello Johan, > > On Wed, 1 Feb 2023 11:15:40 +0100 > Johan Hovold <johan+linaro@kernel.org> wrote: > > > The current interconnect provider registration interface is inherently > > racy as nodes are not added until the after adding the provider. This > > can specifically cause racing DT lookups to fail. > > > > Switch to using the new API where the provider is not registered until > > after it has been fully initialised. > > > > Fixes: f0d8048525d7 ("interconnect: Add imx core driver") > > Cc: stable@vger.kernel.org # 5.8 > > Cc: Leonard Crestez <leonard.crestez@nxp.com> > > Cc: Alexandre Bailon <abailon@baylibre.com> > > Signed-off-by: Johan Hovold <johan+linaro@kernel.org> > > Georgi pointed me to this series after I reported a bug yesterday [0], > that I found on iMX8MP. So I ran some tests with my original, failing > tree, minus one patch with my debugging code to hunt for the bug, plus > patches 1-4 of this series. > > The original code was failing approx 5~10% of the times. With your 4 > patches applied it ran 139 times with zero errors, which looks great! I > won't be able to do more testing until next Monday to be extra sure. Thanks for testing. It indeed looks like you're hitting the same race, and as the imx interconnect driver also initialises the provider data num_nodes count before adding the nodes it results in that NULL-deref (where the qcom driver failed a bit more gracefully). Johan > [0] > https://lore.kernel.org/linux-arm-kernel/20230202175525.3dba79a7@booty/T/#u
Hello Johan, On Mon, 6 Feb 2023 09:09:50 +0100 Johan Hovold <johan@kernel.org> wrote: > On Fri, Feb 03, 2023 at 05:01:21PM +0100, Luca Ceresoli wrote: > > Hello Johan, > > > > On Wed, 1 Feb 2023 11:15:40 +0100 > > Johan Hovold <johan+linaro@kernel.org> wrote: > > > > > The current interconnect provider registration interface is inherently > > > racy as nodes are not added until the after adding the provider. This > > > can specifically cause racing DT lookups to fail. > > > > > > Switch to using the new API where the provider is not registered until > > > after it has been fully initialised. > > > > > > Fixes: f0d8048525d7 ("interconnect: Add imx core driver") > > > Cc: stable@vger.kernel.org # 5.8 > > > Cc: Leonard Crestez <leonard.crestez@nxp.com> > > > Cc: Alexandre Bailon <abailon@baylibre.com> > > > Signed-off-by: Johan Hovold <johan+linaro@kernel.org> > > > > Georgi pointed me to this series after I reported a bug yesterday [0], > > that I found on iMX8MP. So I ran some tests with my original, failing > > tree, minus one patch with my debugging code to hunt for the bug, plus > > patches 1-4 of this series. > > > > The original code was failing approx 5~10% of the times. With your 4 > > patches applied it ran 139 times with zero errors, which looks great! I > > won't be able to do more testing until next Monday to be extra sure. > > Thanks for testing. > > It indeed looks like you're hitting the same race, and as the imx > interconnect driver also initialises the provider data num_nodes count > before adding the nodes it results in that NULL-deref (where the qcom > driver failed a bit more gracefully). My v6.2-rc5 tree with patches 1 to 4 added has booted 590 times with 0 errors, which add to the 139 times on Friday. This definitely deserves my: Tested-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
diff --git a/drivers/interconnect/imx/imx.c b/drivers/interconnect/imx/imx.c index 823d9be9771a..979ed610f704 100644 --- a/drivers/interconnect/imx/imx.c +++ b/drivers/interconnect/imx/imx.c @@ -295,6 +295,9 @@ int imx_icc_register(struct platform_device *pdev, provider->xlate = of_icc_xlate_onecell; provider->data = data; provider->dev = dev->parent; + + icc_provider_init(provider); + platform_set_drvdata(pdev, imx_provider); if (settings) { @@ -306,20 +309,18 @@ int imx_icc_register(struct platform_device *pdev, } } - ret = icc_provider_add(provider); - if (ret) { - dev_err(dev, "error adding interconnect provider: %d\n", ret); + ret = imx_icc_register_nodes(imx_provider, nodes, nodes_count, settings); + if (ret) return ret; - } - ret = imx_icc_register_nodes(imx_provider, nodes, nodes_count, settings); + ret = icc_provider_register(provider); if (ret) - goto provider_del; + goto err_unregister_nodes; return 0; -provider_del: - icc_provider_del(provider); +err_unregister_nodes: + imx_icc_unregister_nodes(&imx_provider->provider); return ret; } EXPORT_SYMBOL_GPL(imx_icc_register); @@ -328,9 +329,8 @@ void imx_icc_unregister(struct platform_device *pdev) { struct imx_icc_provider *imx_provider = platform_get_drvdata(pdev); + icc_provider_deregister(&imx_provider->provider); imx_icc_unregister_nodes(&imx_provider->provider); - - icc_provider_del(&imx_provider->provider); } EXPORT_SYMBOL_GPL(imx_icc_unregister);
The current interconnect provider registration interface is inherently racy as nodes are not added until the after adding the provider. This can specifically cause racing DT lookups to fail. Switch to using the new API where the provider is not registered until after it has been fully initialised. Fixes: f0d8048525d7 ("interconnect: Add imx core driver") Cc: stable@vger.kernel.org # 5.8 Cc: Leonard Crestez <leonard.crestez@nxp.com> Cc: Alexandre Bailon <abailon@baylibre.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> --- drivers/interconnect/imx/imx.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)