diff mbox series

[04/23] interconnect: imx: fix registration race

Message ID 20230201101559.15529-5-johan+linaro@kernel.org
State Superseded
Headers show
Series interconnect: fix racy provider registration | expand

Commit Message

Johan Hovold Feb. 1, 2023, 10:15 a.m. UTC
The current interconnect provider registration interface is inherently
racy as nodes are not added until the after adding the provider. This
can specifically cause racing DT lookups to fail.

Switch to using the new API where the provider is not registered until
after it has been fully initialised.

Fixes: f0d8048525d7 ("interconnect: Add imx core driver")
Cc: stable@vger.kernel.org      # 5.8
Cc: Leonard Crestez <leonard.crestez@nxp.com>
Cc: Alexandre Bailon <abailon@baylibre.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
---
 drivers/interconnect/imx/imx.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

Comments

Luca Ceresoli Feb. 3, 2023, 4:01 p.m. UTC | #1
Hello Johan,

On Wed,  1 Feb 2023 11:15:40 +0100
Johan Hovold <johan+linaro@kernel.org> wrote:

> The current interconnect provider registration interface is inherently
> racy as nodes are not added until the after adding the provider. This
> can specifically cause racing DT lookups to fail.
> 
> Switch to using the new API where the provider is not registered until
> after it has been fully initialised.
> 
> Fixes: f0d8048525d7 ("interconnect: Add imx core driver")
> Cc: stable@vger.kernel.org      # 5.8
> Cc: Leonard Crestez <leonard.crestez@nxp.com>
> Cc: Alexandre Bailon <abailon@baylibre.com>
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>

Georgi pointed me to this series after I reported a bug yesterday [0],
that I found on iMX8MP. So I ran some tests with my original, failing
tree, minus one patch with my debugging code to hunt for the bug, plus
patches 1-4 of this series.

The original code was failing approx 5~10% of the times. With your 4
patches applied it ran 139 times with zero errors, which looks great! I
won't be able to do more testing until next Monday to be extra sure.

[0]
https://lore.kernel.org/linux-arm-kernel/20230202175525.3dba79a7@booty/T/#u
Johan Hovold Feb. 6, 2023, 8:09 a.m. UTC | #2
On Fri, Feb 03, 2023 at 05:01:21PM +0100, Luca Ceresoli wrote:
> Hello Johan,
> 
> On Wed,  1 Feb 2023 11:15:40 +0100
> Johan Hovold <johan+linaro@kernel.org> wrote:
> 
> > The current interconnect provider registration interface is inherently
> > racy as nodes are not added until the after adding the provider. This
> > can specifically cause racing DT lookups to fail.
> > 
> > Switch to using the new API where the provider is not registered until
> > after it has been fully initialised.
> > 
> > Fixes: f0d8048525d7 ("interconnect: Add imx core driver")
> > Cc: stable@vger.kernel.org      # 5.8
> > Cc: Leonard Crestez <leonard.crestez@nxp.com>
> > Cc: Alexandre Bailon <abailon@baylibre.com>
> > Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> 
> Georgi pointed me to this series after I reported a bug yesterday [0],
> that I found on iMX8MP. So I ran some tests with my original, failing
> tree, minus one patch with my debugging code to hunt for the bug, plus
> patches 1-4 of this series.
> 
> The original code was failing approx 5~10% of the times. With your 4
> patches applied it ran 139 times with zero errors, which looks great! I
> won't be able to do more testing until next Monday to be extra sure.

Thanks for testing.

It indeed looks like you're hitting the same race, and as the imx
interconnect driver also initialises the provider data num_nodes count
before adding the nodes it results in that NULL-deref (where the qcom
driver failed a bit more gracefully).

Johan

> [0]
> https://lore.kernel.org/linux-arm-kernel/20230202175525.3dba79a7@booty/T/#u
Luca Ceresoli Feb. 6, 2023, 8:52 p.m. UTC | #3
Hello Johan,

On Mon, 6 Feb 2023 09:09:50 +0100
Johan Hovold <johan@kernel.org> wrote:

> On Fri, Feb 03, 2023 at 05:01:21PM +0100, Luca Ceresoli wrote:
> > Hello Johan,
> > 
> > On Wed,  1 Feb 2023 11:15:40 +0100
> > Johan Hovold <johan+linaro@kernel.org> wrote:
> >   
> > > The current interconnect provider registration interface is inherently
> > > racy as nodes are not added until the after adding the provider. This
> > > can specifically cause racing DT lookups to fail.
> > > 
> > > Switch to using the new API where the provider is not registered until
> > > after it has been fully initialised.
> > > 
> > > Fixes: f0d8048525d7 ("interconnect: Add imx core driver")
> > > Cc: stable@vger.kernel.org      # 5.8
> > > Cc: Leonard Crestez <leonard.crestez@nxp.com>
> > > Cc: Alexandre Bailon <abailon@baylibre.com>
> > > Signed-off-by: Johan Hovold <johan+linaro@kernel.org>  
> > 
> > Georgi pointed me to this series after I reported a bug yesterday [0],
> > that I found on iMX8MP. So I ran some tests with my original, failing
> > tree, minus one patch with my debugging code to hunt for the bug, plus
> > patches 1-4 of this series.
> > 
> > The original code was failing approx 5~10% of the times. With your 4
> > patches applied it ran 139 times with zero errors, which looks great! I
> > won't be able to do more testing until next Monday to be extra sure.  
> 
> Thanks for testing.
> 
> It indeed looks like you're hitting the same race, and as the imx
> interconnect driver also initialises the provider data num_nodes count
> before adding the nodes it results in that NULL-deref (where the qcom
> driver failed a bit more gracefully).

My v6.2-rc5 tree with patches 1 to 4 added has booted 590 times with 0
errors, which add to the 139 times on Friday. This definitely deserves
my:

Tested-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
diff mbox series

Patch

diff --git a/drivers/interconnect/imx/imx.c b/drivers/interconnect/imx/imx.c
index 823d9be9771a..979ed610f704 100644
--- a/drivers/interconnect/imx/imx.c
+++ b/drivers/interconnect/imx/imx.c
@@ -295,6 +295,9 @@  int imx_icc_register(struct platform_device *pdev,
 	provider->xlate = of_icc_xlate_onecell;
 	provider->data = data;
 	provider->dev = dev->parent;
+
+	icc_provider_init(provider);
+
 	platform_set_drvdata(pdev, imx_provider);
 
 	if (settings) {
@@ -306,20 +309,18 @@  int imx_icc_register(struct platform_device *pdev,
 		}
 	}
 
-	ret = icc_provider_add(provider);
-	if (ret) {
-		dev_err(dev, "error adding interconnect provider: %d\n", ret);
+	ret = imx_icc_register_nodes(imx_provider, nodes, nodes_count, settings);
+	if (ret)
 		return ret;
-	}
 
-	ret = imx_icc_register_nodes(imx_provider, nodes, nodes_count, settings);
+	ret = icc_provider_register(provider);
 	if (ret)
-		goto provider_del;
+		goto err_unregister_nodes;
 
 	return 0;
 
-provider_del:
-	icc_provider_del(provider);
+err_unregister_nodes:
+	imx_icc_unregister_nodes(&imx_provider->provider);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(imx_icc_register);
@@ -328,9 +329,8 @@  void imx_icc_unregister(struct platform_device *pdev)
 {
 	struct imx_icc_provider *imx_provider = platform_get_drvdata(pdev);
 
+	icc_provider_deregister(&imx_provider->provider);
 	imx_icc_unregister_nodes(&imx_provider->provider);
-
-	icc_provider_del(&imx_provider->provider);
 }
 EXPORT_SYMBOL_GPL(imx_icc_unregister);