mbox series

[00/23] interconnect: fix racy provider registration

Message ID 20230201101559.15529-1-johan+linaro@kernel.org
Headers show
Series interconnect: fix racy provider registration | expand

Message

Johan Hovold Feb. 1, 2023, 10:15 a.m. UTC
The current interconnect provider interface is inherently racy as
providers are expected to be registered before being fully initialised.

This can specifically cause racing DT lookups to fail as I recently
noticed when the Qualcomm cpufreq driver failed to probe:

	of_icc_xlate_onecell: invalid index 0
        cpu cpu0: error -EINVAL: error finding src node
        cpu cpu0: dev_pm_opp_of_find_icc_paths: Unable to get path0: -22
        qcom-cpufreq-hw: probe of 18591000.cpufreq failed with error -22

This only happens very rarely, but the bug is easily reproduced by
increasing the race window by adding an msleep() after registering
osm-l3 interconnect provider.

Note that the Qualcomm cpufreq driver is especially susceptible to this
race as the interconnect path is looked up from the CPU nodes so that
driver core does not guarantee the probe order even when device links
are enabled (which they not always are).

This series adds a new interconnect provider registration API which is
used to fix up the interconnect drivers before removing the old racy
API.

Included are also a number of fixes for other bugs found while preparing
the series.

Johan


Johan Hovold (23):
  interconnect: fix mem leak when freeing nodes
  interconnect: fix icc_provider_del() error handling
  interconnect: fix provider registration API
  interconnect: imx: fix registration race
  interconnect: qcom: osm-l3: fix registration race
  interconnect: qcom: rpm: fix probe child-node error handling
  interconnect: qcom: rpm: fix probe PM domain error handling
  interconnect: qcom: rpm: fix registration race
  interconnect: qcom: rpmh: fix probe child-node error handling
  interconnect: qcom: rpmh: fix registration race
  interconnect: qcom: msm8974: fix registration race
  interconnect: qcom: sm8450: fix registration race
  interconnect: qcom: sm8550: fix registration race
  interconnect: exynos: fix node leak in probe PM QoS error path
  interconnect: exynos: fix registration race
  interconnect: exynos: drop redundant link destroy
  memory: tegra: fix interconnect registration race
  memory: tegra124-emc: fix interconnect registration race
  memory: tegra20-emc: fix interconnect registration race
  memory: tegra30-emc: fix interconnect registration race
  interconnect: drop racy registration API
  interconnect: drop unused icc_get() interface
  interconnect: drop unused icc_link_destroy() interface

 drivers/interconnect/core.c           | 149 +++++---------------------
 drivers/interconnect/imx/imx.c        |  20 ++--
 drivers/interconnect/qcom/icc-rpm.c   |  33 +++---
 drivers/interconnect/qcom/icc-rpmh.c  |  30 ++++--
 drivers/interconnect/qcom/msm8974.c   |  20 ++--
 drivers/interconnect/qcom/osm-l3.c    |  14 ++-
 drivers/interconnect/qcom/sm8450.c    |  22 ++--
 drivers/interconnect/qcom/sm8550.c    |  22 ++--
 drivers/interconnect/samsung/exynos.c |  30 +++---
 drivers/memory/tegra/mc.c             |  16 ++-
 drivers/memory/tegra/tegra124-emc.c   |  12 +--
 drivers/memory/tegra/tegra20-emc.c    |  12 +--
 drivers/memory/tegra/tegra30-emc.c    |  12 +--
 include/linux/interconnect-provider.h |  19 ++--
 include/linux/interconnect.h          |   8 --
 15 files changed, 154 insertions(+), 265 deletions(-)

Comments

Abel Vesa Feb. 1, 2023, 10:20 a.m. UTC | #1
On 23-02-01 11:15:49, Johan Hovold wrote:
> The current interconnect provider registration interface is inherently
> racy as nodes are not added until the after adding the provider. This
> can specifically cause racing DT lookups to fail.
> 
> Switch to using the new API where the provider is not registered until
> after it has been fully initialised.

Sounds good to me.

> 
> Fixes: e6f0d6a30f73 ("interconnect: qcom: Add SM8550 interconnect provider driver")
> Cc: Abel Vesa <abel.vesa@linaro.org>
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>

Reviewed-by: Abel Vesa <abel.vesa@linaro.org>

> ---
>  drivers/interconnect/qcom/sm8550.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/interconnect/qcom/sm8550.c b/drivers/interconnect/qcom/sm8550.c
> index 54fa027ab961..7ab492ca8fe0 100644
> --- a/drivers/interconnect/qcom/sm8550.c
> +++ b/drivers/interconnect/qcom/sm8550.c
> @@ -2197,9 +2197,10 @@ static int qnoc_probe(struct platform_device *pdev)
>  	provider->pre_aggregate = qcom_icc_pre_aggregate;
>  	provider->aggregate = qcom_icc_aggregate;
>  	provider->xlate_extended = qcom_icc_xlate_extended;
> -	INIT_LIST_HEAD(&provider->nodes);
>  	provider->data = data;
>  
> +	icc_provider_init(provider);
> +
>  	qp->dev = &pdev->dev;
>  	qp->bcms = desc->bcms;
>  	qp->num_bcms = desc->num_bcms;
> @@ -2208,12 +2209,6 @@ static int qnoc_probe(struct platform_device *pdev)
>  	if (IS_ERR(qp->voter))
>  		return PTR_ERR(qp->voter);
>  
> -	ret = icc_provider_add(provider);
> -	if (ret) {
> -		dev_err_probe(&pdev->dev, ret,
> -			      "error adding interconnect provider\n");
> -		return ret;
> -	}
>  
>  	for (i = 0; i < qp->num_bcms; i++)
>  		qcom_icc_bcm_init(qp->bcms[i], &pdev->dev);
> @@ -2227,7 +2222,7 @@ static int qnoc_probe(struct platform_device *pdev)
>  		node = icc_node_create(qnodes[i]->id);
>  		if (IS_ERR(node)) {
>  			ret = PTR_ERR(node);
> -			goto err;
> +			goto err_remove_nodes;
>  		}
>  
>  		node->name = qnodes[i]->name;
> @@ -2241,12 +2236,17 @@ static int qnoc_probe(struct platform_device *pdev)
>  	}
>  	data->num_nodes = num_nodes;
>  
> +	ret = icc_provider_register(provider);
> +	if (ret)
> +		goto err_remove_nodes;
> +
>  	platform_set_drvdata(pdev, qp);
>  
>  	return 0;
> -err:
> +
> +err_remove_nodes:
>  	icc_nodes_remove(provider);
> -	icc_provider_del(provider);
> +
>  	return ret;
>  }
>  
> @@ -2254,8 +2254,8 @@ static int qnoc_remove(struct platform_device *pdev)
>  {
>  	struct qcom_icc_provider *qp = platform_get_drvdata(pdev);
>  
> +	icc_provider_deregister(&qp->provider);
>  	icc_nodes_remove(&qp->provider);
> -	icc_provider_del(&qp->provider);
>  
>  	return 0;
>  }
> -- 
> 2.39.1
>
Konrad Dybcio Feb. 1, 2023, 11:18 a.m. UTC | #2
On 1.02.2023 11:15, Johan Hovold wrote:
> The node link array is allocated when adding links to a node but is not
> deallocated when nodes are destroyed.
> 
> Fixes: 11f1ceca7031 ("interconnect: Add generic on-chip interconnect API")
> Cc: stable@vger.kernel.org      # 5.1
> Cc: Georgi Djakov <georgi.djakov@linaro.org>
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> ---
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>

Konrad
>  drivers/interconnect/core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
> index 423f875d4b54..dc61620a0191 100644
> --- a/drivers/interconnect/core.c
> +++ b/drivers/interconnect/core.c
> @@ -850,6 +850,7 @@ void icc_node_destroy(int id)
>  
>  	mutex_unlock(&icc_lock);
>  
> +	kfree(node->links);
>  	kfree(node);
>  }
>  EXPORT_SYMBOL_GPL(icc_node_destroy);
Krzysztof Kozlowski Feb. 2, 2023, 10:58 a.m. UTC | #3
On 01/02/2023 11:15, Johan Hovold wrote:
> Make sure to add the newly allocated interconnect node to the provider
> before adding the PM QoS request so that the node is freed on errors.
> 
> Fixes: 2f95b9d5cf0b ("interconnect: Add generic interconnect driver for Exynos SoCs")
> Cc: stable@vger.kernel.org      # 5.11
> Cc: Sylwester Nawrocki <s.nawrocki@samsung.com>
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> ---
>  drivers/interconnect/samsung/exynos.c | 6 +++---


Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>

Best regards,
Krzysztof
Krzysztof Kozlowski Feb. 2, 2023, 11:04 a.m. UTC | #4
On 01/02/2023 11:15, Johan Hovold wrote:
> The current interconnect provider registration interface is inherently
> racy as nodes are not added until the after adding the provider. This
> can specifically cause racing DT lookups to trigger a NULL-pointer
> deference when either a NULL pointer or not fully initialised node is
> returned from exynos_generic_icc_xlate().
> 
> Switch to using the new API where the provider is not registered until
> after it has been fully initialised.
> 
> Fixes: 2f95b9d5cf0b ("interconnect: Add generic interconnect driver for Exynos SoCs")
> Cc: stable@vger.kernel.org      # 5.11
> Cc: Sylwester Nawrocki <s.nawrocki@samsung.com>
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> ---
>  drivers/interconnect/samsung/exynos.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/interconnect/samsung/exynos.c b/drivers/interconnect/samsung/exynos.c
> index e70665899482..72e42603823b 100644
> --- a/drivers/interconnect/samsung/exynos.c
> +++ b/drivers/interconnect/samsung/exynos.c
> @@ -98,12 +98,13 @@ static int exynos_generic_icc_remove(struct platform_device *pdev)
>  	struct exynos_icc_priv *priv = platform_get_drvdata(pdev);
>  	struct icc_node *parent_node, *node = priv->node;
>  
> +	icc_provider_deregister(&priv->provider);
> +
>  	parent_node = exynos_icc_get_parent(priv->dev->parent->of_node);
>  	if (parent_node && !IS_ERR(parent_node))
>  		icc_link_destroy(node, parent_node);
>  
>  	icc_nodes_remove(&priv->provider);
> -	icc_provider_del(&priv->provider);
>  
>  	return 0;
>  }
> @@ -132,15 +133,11 @@ static int exynos_generic_icc_probe(struct platform_device *pdev)
>  	provider->inter_set = true;
>  	provider->data = priv;
>  
> -	ret = icc_provider_add(provider);
> -	if (ret < 0)
> -		return ret;
> +	icc_provider_init(provider);
>  
>  	icc_node = icc_node_create(pdev->id);
> -	if (IS_ERR(icc_node)) {
> -		ret = PTR_ERR(icc_node);
> -		goto err_prov_del;
> -	}
> +	if (IS_ERR(icc_node))
> +		return PTR_ERR(icc_node);
>  
>  	priv->node = icc_node;
>  	icc_node->name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "%pOFn",
> @@ -171,14 +168,17 @@ static int exynos_generic_icc_probe(struct platform_device *pdev)
>  			goto err_pmqos_del;
>  	}
>  
> +	ret = icc_provider_register(provider);
> +	if (ret < 0)
> +		goto err_pmqos_del;

If I understand correctly there is no need for icc_link_destroy() in
error path here, right? Even in case of probe retry (defer or whatever
reason) - the link will be removed with icc_nodes_remove()?

Best regards,
Krzysztof
Krzysztof Kozlowski Feb. 2, 2023, 11:09 a.m. UTC | #5
On 01/02/2023 11:15, Johan Hovold wrote:
> There is no longer any need to explicitly destroy node links as this is
> now done when the node is destroyed as part of icc_nodes_remove().
> 
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> ---
>  drivers/interconnect/samsung/exynos.c | 6 ------
>  1 file changed, 6 deletions(-)


Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>


Best regards,
Krzysztof
Krzysztof Kozlowski Feb. 2, 2023, 11:13 a.m. UTC | #6
On 01/02/2023 11:15, Johan Hovold wrote:
> The current interconnect provider interface is inherently racy as
> providers are expected to be registered before being fully initialised.
> 
> This can specifically cause racing DT lookups to fail as I recently
> noticed when the Qualcomm cpufreq driver failed to probe:
> 
> 	of_icc_xlate_onecell: invalid index 0
>         cpu cpu0: error -EINVAL: error finding src node
>         cpu cpu0: dev_pm_opp_of_find_icc_paths: Unable to get path0: -22
>         qcom-cpufreq-hw: probe of 18591000.cpufreq failed with error -22
> 
> This only happens very rarely, but the bug is easily reproduced by
> increasing the race window by adding an msleep() after registering
> osm-l3 interconnect provider.
> 
> Note that the Qualcomm cpufreq driver is especially susceptible to this
> race as the interconnect path is looked up from the CPU nodes so that
> driver core does not guarantee the probe order even when device links
> are enabled (which they not always are).
> 
> This series adds a new interconnect provider registration API which is
> used to fix up the interconnect drivers before removing the old racy
> API.
> 

So is there a dependency or not? Can you make it clear that I shouldn't
take memory controller bits?

Best regards,
Krzysztof
Johan Hovold Feb. 2, 2023, 12:17 p.m. UTC | #7
On Thu, Feb 02, 2023 at 12:04:49PM +0100, Krzysztof Kozlowski wrote:
> On 01/02/2023 11:15, Johan Hovold wrote:

> > @@ -98,12 +98,13 @@ static int exynos_generic_icc_remove(struct platform_device *pdev)
> >  	struct exynos_icc_priv *priv = platform_get_drvdata(pdev);
> >  	struct icc_node *parent_node, *node = priv->node;
> >  
> > +	icc_provider_deregister(&priv->provider);
> > +
> >  	parent_node = exynos_icc_get_parent(priv->dev->parent->of_node);
> >  	if (parent_node && !IS_ERR(parent_node))
> >  		icc_link_destroy(node, parent_node);
> >  
> >  	icc_nodes_remove(&priv->provider);
> > -	icc_provider_del(&priv->provider);
> >  
> >  	return 0;
> >  }
> > @@ -132,15 +133,11 @@ static int exynos_generic_icc_probe(struct platform_device *pdev)
> >  	provider->inter_set = true;
> >  	provider->data = priv;
> >  
> > -	ret = icc_provider_add(provider);
> > -	if (ret < 0)
> > -		return ret;
> > +	icc_provider_init(provider);
> >  
> >  	icc_node = icc_node_create(pdev->id);
> > -	if (IS_ERR(icc_node)) {
> > -		ret = PTR_ERR(icc_node);
> > -		goto err_prov_del;
> > -	}
> > +	if (IS_ERR(icc_node))
> > +		return PTR_ERR(icc_node);
> >  
> >  	priv->node = icc_node;
> >  	icc_node->name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "%pOFn",
> > @@ -171,14 +168,17 @@ static int exynos_generic_icc_probe(struct platform_device *pdev)
> >  			goto err_pmqos_del;
> >  	}
> >  
> > +	ret = icc_provider_register(provider);
> > +	if (ret < 0)
> > +		goto err_pmqos_del;
> 
> If I understand correctly there is no need for icc_link_destroy() in
> error path here, right? Even in case of probe retry (defer or whatever
> reason) - the link will be removed with icc_nodes_remove()?

Correct, it is no longer needed after the first patch in this series.

The exynos driver was the only driver that bothered to remove links
explicitly, all the others expected the interconnect framework to do so
when destroying nodes even if that was not case until now.

Johan
Johan Hovold Feb. 2, 2023, 12:20 p.m. UTC | #8
On Thu, Feb 02, 2023 at 12:13:33PM +0100, Krzysztof Kozlowski wrote:
> On 01/02/2023 11:15, Johan Hovold wrote:
> > The current interconnect provider interface is inherently racy as
> > providers are expected to be registered before being fully initialised.
> > 
> > This can specifically cause racing DT lookups to fail as I recently
> > noticed when the Qualcomm cpufreq driver failed to probe:
> > 
> > 	of_icc_xlate_onecell: invalid index 0
> >         cpu cpu0: error -EINVAL: error finding src node
> >         cpu cpu0: dev_pm_opp_of_find_icc_paths: Unable to get path0: -22
> >         qcom-cpufreq-hw: probe of 18591000.cpufreq failed with error -22
> > 
> > This only happens very rarely, but the bug is easily reproduced by
> > increasing the race window by adding an msleep() after registering
> > osm-l3 interconnect provider.
> > 
> > Note that the Qualcomm cpufreq driver is especially susceptible to this
> > race as the interconnect path is looked up from the CPU nodes so that
> > driver core does not guarantee the probe order even when device links
> > are enabled (which they not always are).
> > 
> > This series adds a new interconnect provider registration API which is
> > used to fix up the interconnect drivers before removing the old racy
> > API.
> > 
> 
> So is there a dependency or not? Can you make it clear that I shouldn't
> take memory controller bits?

As the fixes depend on the new API it is best if these could all go
through Georgi's tree.

Johan
Krzysztof Kozlowski Feb. 2, 2023, 12:20 p.m. UTC | #9
On 02/02/2023 13:17, Johan Hovold wrote:
> On Thu, Feb 02, 2023 at 12:04:49PM +0100, Krzysztof Kozlowski wrote:
>> On 01/02/2023 11:15, Johan Hovold wrote:
> 


Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>

Best regards,
Krzysztof
Krzysztof Kozlowski Feb. 2, 2023, 12:21 p.m. UTC | #10
On 01/02/2023 11:15, Johan Hovold wrote:
> The current interconnect provider registration interface is inherently
> racy as nodes are not added until the after adding the provider. This
> can specifically cause racing DT lookups to fail.
> 
> Switch to using the new API where the provider is not registered until
> after it has been fully initialised.
> 
> Fixes: 06f079816d4c ("memory: tegra-mc: Add interconnect framework")
> Cc: stable@vger.kernel.org      # 5.11
> Cc: Dmitry Osipenko <digetx@gmail.com>
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>


Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>

(or tell me if I should take it via memory-controllers)

Best regards,
Krzysztof
Krzysztof Kozlowski Feb. 2, 2023, 12:21 p.m. UTC | #11
On 01/02/2023 11:15, Johan Hovold wrote:
> The current interconnect provider registration interface is inherently
> racy as nodes are not added until the after adding the provider. This
> can specifically cause racing DT lookups to fail.
> 
> Switch to using the new API where the provider is not registered until
> after it has been fully initialised.
> 
> Fixes: d5ef16ba5fbe ("memory: tegra20: Support interconnect framework")
> Cc: stable@vger.kernel.org      # 5.11
> Cc: Dmitry Osipenko <digetx@gmail.com>
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>


Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>

Best regards,
Krzysztof
Konrad Dybcio Feb. 3, 2023, 2:49 a.m. UTC | #12
On 1.02.2023 11:15, Johan Hovold wrote:
> The current interconnect provider registration interface is inherently
> racy as nodes are not added until the after adding the provider. This
> can specifically cause racing DT lookups to fail.
> 
> Switch to using the new API where the provider is not registered until
> after it has been fully initialised.
> 
> Fixes: f0d8048525d7 ("interconnect: Add imx core driver")
> Cc: stable@vger.kernel.org      # 5.8
> Cc: Leonard Crestez <leonard.crestez@nxp.com>
> Cc: Alexandre Bailon <abailon@baylibre.com>
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> ---
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>

Konrad
>  drivers/interconnect/imx/imx.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/interconnect/imx/imx.c b/drivers/interconnect/imx/imx.c
> index 823d9be9771a..979ed610f704 100644
> --- a/drivers/interconnect/imx/imx.c
> +++ b/drivers/interconnect/imx/imx.c
> @@ -295,6 +295,9 @@ int imx_icc_register(struct platform_device *pdev,
>  	provider->xlate = of_icc_xlate_onecell;
>  	provider->data = data;
>  	provider->dev = dev->parent;
> +
> +	icc_provider_init(provider);
> +
>  	platform_set_drvdata(pdev, imx_provider);
>  
>  	if (settings) {
> @@ -306,20 +309,18 @@ int imx_icc_register(struct platform_device *pdev,
>  		}
>  	}
>  
> -	ret = icc_provider_add(provider);
> -	if (ret) {
> -		dev_err(dev, "error adding interconnect provider: %d\n", ret);
> +	ret = imx_icc_register_nodes(imx_provider, nodes, nodes_count, settings);
> +	if (ret)
>  		return ret;
> -	}
>  
> -	ret = imx_icc_register_nodes(imx_provider, nodes, nodes_count, settings);
> +	ret = icc_provider_register(provider);
>  	if (ret)
> -		goto provider_del;
> +		goto err_unregister_nodes;
>  
>  	return 0;
>  
> -provider_del:
> -	icc_provider_del(provider);
> +err_unregister_nodes:
> +	imx_icc_unregister_nodes(&imx_provider->provider);
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(imx_icc_register);
> @@ -328,9 +329,8 @@ void imx_icc_unregister(struct platform_device *pdev)
>  {
>  	struct imx_icc_provider *imx_provider = platform_get_drvdata(pdev);
>  
> +	icc_provider_deregister(&imx_provider->provider);
>  	imx_icc_unregister_nodes(&imx_provider->provider);
> -
> -	icc_provider_del(&imx_provider->provider);
>  }
>  EXPORT_SYMBOL_GPL(imx_icc_unregister);
>
Konrad Dybcio Feb. 3, 2023, 2:53 a.m. UTC | #13
On 1.02.2023 11:15, Johan Hovold wrote:
> The current interconnect provider registration interface is inherently
> racy as nodes are not added until the after adding the provider. This
> can specifically cause racing DT lookups to fail.
> 
> Switch to using the new API where the provider is not registered until
> after it has been fully initialised.
> 
> Fixes: 62feb14ee8a3 ("interconnect: qcom: Consolidate interconnect RPM support")
> Fixes: 30c8fa3ec61a ("interconnect: qcom: Add MSM8916 interconnect provider driver")
> Cc: stable@vger.kernel.org	# 5.7
> Cc: Jun Nie <jun.nie@linaro.org>
> Cc: Georgi Djakov <georgi.djakov@linaro.org>
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> ---
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>

Konrad
>  drivers/interconnect/qcom/icc-rpm.c | 23 ++++++++++++-----------
>  1 file changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/interconnect/qcom/icc-rpm.c b/drivers/interconnect/qcom/icc-rpm.c
> index da595059cafd..4d0997b210f7 100644
> --- a/drivers/interconnect/qcom/icc-rpm.c
> +++ b/drivers/interconnect/qcom/icc-rpm.c
> @@ -502,7 +502,6 @@ int qnoc_probe(struct platform_device *pdev)
>  	}
>  
>  	provider = &qp->provider;
> -	INIT_LIST_HEAD(&provider->nodes);
>  	provider->dev = dev;
>  	provider->set = qcom_icc_set;
>  	provider->pre_aggregate = qcom_icc_pre_bw_aggregate;
> @@ -510,11 +509,7 @@ int qnoc_probe(struct platform_device *pdev)
>  	provider->xlate_extended = qcom_icc_xlate_extended;
>  	provider->data = data;
>  
> -	ret = icc_provider_add(provider);
> -	if (ret) {
> -		dev_err(dev, "error adding interconnect provider: %d\n", ret);
> -		goto err_disable_clks;
> -	}
> +	icc_provider_init(provider);
>  
>  	for (i = 0; i < num_nodes; i++) {
>  		size_t j;
> @@ -522,7 +517,7 @@ int qnoc_probe(struct platform_device *pdev)
>  		node = icc_node_create(qnodes[i]->id);
>  		if (IS_ERR(node)) {
>  			ret = PTR_ERR(node);
> -			goto err;
> +			goto err_remove_nodes;
>  		}
>  
>  		node->name = qnodes[i]->name;
> @@ -536,19 +531,25 @@ int qnoc_probe(struct platform_device *pdev)
>  	}
>  	data->num_nodes = num_nodes;
>  
> +	ret = icc_provider_register(provider);
> +	if (ret)
> +		goto err_remove_nodes;
> +
>  	platform_set_drvdata(pdev, qp);
>  
>  	/* Populate child NoC devices if any */
>  	if (of_get_child_count(dev->of_node) > 0) {
>  		ret = of_platform_populate(dev->of_node, NULL, NULL, dev);
>  		if (ret)
> -			goto err;
> +			goto err_deregister_provider;
>  	}
>  
>  	return 0;
> -err:
> +
> +err_deregister_provider:
> +	icc_provider_deregister(provider);
> +err_remove_nodes:
>  	icc_nodes_remove(provider);
> -	icc_provider_del(provider);
>  err_disable_clks:
>  	clk_bulk_disable_unprepare(qp->num_clks, qp->bus_clks);
>  
> @@ -560,9 +561,9 @@ int qnoc_remove(struct platform_device *pdev)
>  {
>  	struct qcom_icc_provider *qp = platform_get_drvdata(pdev);
>  
> +	icc_provider_deregister(&qp->provider);
>  	icc_nodes_remove(&qp->provider);
>  	clk_bulk_disable_unprepare(qp->num_clks, qp->bus_clks);
> -	icc_provider_del(&qp->provider);
>  
>  	return 0;
>  }
Konrad Dybcio Feb. 3, 2023, 2:55 a.m. UTC | #14
On 1.02.2023 11:15, Johan Hovold wrote:
> The current interconnect provider registration interface is inherently
> racy as nodes are not added until the after adding the provider. This
> can specifically cause racing DT lookups to fail.
> 
> Switch to using the new API where the provider is not registered until
> after it has been fully initialised.
> 
> Fixes: 976daac4a1c5 ("interconnect: qcom: Consolidate interconnect RPMh support")
> Cc: stable@vger.kernel.org      # 5.7
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> ---
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>

Konrad
>  drivers/interconnect/qcom/icc-rpmh.c | 25 +++++++++++++++----------
>  1 file changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/interconnect/qcom/icc-rpmh.c b/drivers/interconnect/qcom/icc-rpmh.c
> index 5168bbf3d92f..fdb5e58e408b 100644
> --- a/drivers/interconnect/qcom/icc-rpmh.c
> +++ b/drivers/interconnect/qcom/icc-rpmh.c
> @@ -192,9 +192,10 @@ int qcom_icc_rpmh_probe(struct platform_device *pdev)
>  	provider->pre_aggregate = qcom_icc_pre_aggregate;
>  	provider->aggregate = qcom_icc_aggregate;
>  	provider->xlate_extended = qcom_icc_xlate_extended;
> -	INIT_LIST_HEAD(&provider->nodes);
>  	provider->data = data;
>  
> +	icc_provider_init(provider);
> +
>  	qp->dev = dev;
>  	qp->bcms = desc->bcms;
>  	qp->num_bcms = desc->num_bcms;
> @@ -203,10 +204,6 @@ int qcom_icc_rpmh_probe(struct platform_device *pdev)
>  	if (IS_ERR(qp->voter))
>  		return PTR_ERR(qp->voter);
>  
> -	ret = icc_provider_add(provider);
> -	if (ret)
> -		return ret;
> -
>  	for (i = 0; i < qp->num_bcms; i++)
>  		qcom_icc_bcm_init(qp->bcms[i], dev);
>  
> @@ -218,7 +215,7 @@ int qcom_icc_rpmh_probe(struct platform_device *pdev)
>  		node = icc_node_create(qn->id);
>  		if (IS_ERR(node)) {
>  			ret = PTR_ERR(node);
> -			goto err;
> +			goto err_remove_nodes;
>  		}
>  
>  		node->name = qn->name;
> @@ -232,19 +229,27 @@ int qcom_icc_rpmh_probe(struct platform_device *pdev)
>  	}
>  
>  	data->num_nodes = num_nodes;
> +
> +	ret = icc_provider_register(provider);
> +	if (ret)
> +		goto err_remove_nodes;
> +
>  	platform_set_drvdata(pdev, qp);
>  
>  	/* Populate child NoC devices if any */
>  	if (of_get_child_count(dev->of_node) > 0) {
>  		ret = of_platform_populate(dev->of_node, NULL, NULL, dev);
>  		if (ret)
> -			goto err;
> +			goto err_deregister_provider;
>  	}
>  
>  	return 0;
> -err:
> +
> +err_deregister_provider:
> +	icc_provider_deregister(provider);
> +err_remove_nodes:
>  	icc_nodes_remove(provider);
> -	icc_provider_del(provider);
> +
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(qcom_icc_rpmh_probe);
> @@ -253,8 +258,8 @@ int qcom_icc_rpmh_remove(struct platform_device *pdev)
>  {
>  	struct qcom_icc_provider *qp = platform_get_drvdata(pdev);
>  
> +	icc_provider_deregister(&qp->provider);
>  	icc_nodes_remove(&qp->provider);
> -	icc_provider_del(&qp->provider);
>  
>  	return 0;
>  }
Konrad Dybcio Feb. 3, 2023, 2:57 a.m. UTC | #15
On 1.02.2023 11:15, Johan Hovold wrote:
> The current interconnect provider registration interface is inherently
> racy as nodes are not added until the after adding the provider. This
> can specifically cause racing DT lookups to fail.
> 
> Switch to using the new API where the provider is not registered until
> after it has been fully initialised.
> 
> Fixes: e6f0d6a30f73 ("interconnect: qcom: Add SM8550 interconnect provider driver")
> Cc: Abel Vesa <abel.vesa@linaro.org>
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> ---
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>

Konrad
>  drivers/interconnect/qcom/sm8550.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/interconnect/qcom/sm8550.c b/drivers/interconnect/qcom/sm8550.c
> index 54fa027ab961..7ab492ca8fe0 100644
> --- a/drivers/interconnect/qcom/sm8550.c
> +++ b/drivers/interconnect/qcom/sm8550.c
> @@ -2197,9 +2197,10 @@ static int qnoc_probe(struct platform_device *pdev)
>  	provider->pre_aggregate = qcom_icc_pre_aggregate;
>  	provider->aggregate = qcom_icc_aggregate;
>  	provider->xlate_extended = qcom_icc_xlate_extended;
> -	INIT_LIST_HEAD(&provider->nodes);
>  	provider->data = data;
>  
> +	icc_provider_init(provider);
> +
>  	qp->dev = &pdev->dev;
>  	qp->bcms = desc->bcms;
>  	qp->num_bcms = desc->num_bcms;
> @@ -2208,12 +2209,6 @@ static int qnoc_probe(struct platform_device *pdev)
>  	if (IS_ERR(qp->voter))
>  		return PTR_ERR(qp->voter);
>  
> -	ret = icc_provider_add(provider);
> -	if (ret) {
> -		dev_err_probe(&pdev->dev, ret,
> -			      "error adding interconnect provider\n");
> -		return ret;
> -	}
>  
>  	for (i = 0; i < qp->num_bcms; i++)
>  		qcom_icc_bcm_init(qp->bcms[i], &pdev->dev);
> @@ -2227,7 +2222,7 @@ static int qnoc_probe(struct platform_device *pdev)
>  		node = icc_node_create(qnodes[i]->id);
>  		if (IS_ERR(node)) {
>  			ret = PTR_ERR(node);
> -			goto err;
> +			goto err_remove_nodes;
>  		}
>  
>  		node->name = qnodes[i]->name;
> @@ -2241,12 +2236,17 @@ static int qnoc_probe(struct platform_device *pdev)
>  	}
>  	data->num_nodes = num_nodes;
>  
> +	ret = icc_provider_register(provider);
> +	if (ret)
> +		goto err_remove_nodes;
> +
>  	platform_set_drvdata(pdev, qp);
>  
>  	return 0;
> -err:
> +
> +err_remove_nodes:
>  	icc_nodes_remove(provider);
> -	icc_provider_del(provider);
> +
>  	return ret;
>  }
>  
> @@ -2254,8 +2254,8 @@ static int qnoc_remove(struct platform_device *pdev)
>  {
>  	struct qcom_icc_provider *qp = platform_get_drvdata(pdev);
>  
> +	icc_provider_deregister(&qp->provider);
>  	icc_nodes_remove(&qp->provider);
> -	icc_provider_del(&qp->provider);
>  
>  	return 0;
>  }
Konrad Dybcio Feb. 3, 2023, 2:58 a.m. UTC | #16
On 1.02.2023 11:15, Johan Hovold wrote:
> Now that all interconnect drivers have been converted to the new
> provider registration API, the old racy interface can be removed.
> 
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> ---
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>

Konrad
>  drivers/interconnect/core.c           | 16 ----------------
>  include/linux/interconnect-provider.h | 11 -----------
>  2 files changed, 27 deletions(-)
> 
> diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
> index 93d27ff8eef6..b8917823fd95 100644
> --- a/drivers/interconnect/core.c
> +++ b/drivers/interconnect/core.c
> @@ -1078,22 +1078,6 @@ void icc_provider_deregister(struct icc_provider *provider)
>  }
>  EXPORT_SYMBOL_GPL(icc_provider_deregister);
>  
> -int icc_provider_add(struct icc_provider *provider)
> -{
> -	icc_provider_init(provider);
> -
> -	return icc_provider_register(provider);
> -}
> -EXPORT_SYMBOL_GPL(icc_provider_add);
> -
> -void icc_provider_del(struct icc_provider *provider)
> -{
> -	WARN_ON(!list_empty(&provider->nodes));
> -
> -	icc_provider_deregister(provider);
> -}
> -EXPORT_SYMBOL_GPL(icc_provider_del);
> -
>  static const struct of_device_id __maybe_unused ignore_list[] = {
>  	{ .compatible = "qcom,sc7180-ipa-virt" },
>  	{ .compatible = "qcom,sc8180x-ipa-virt" },
> diff --git a/include/linux/interconnect-provider.h b/include/linux/interconnect-provider.h
> index d12cd18aab3f..b9af9016a95e 100644
> --- a/include/linux/interconnect-provider.h
> +++ b/include/linux/interconnect-provider.h
> @@ -125,8 +125,6 @@ int icc_nodes_remove(struct icc_provider *provider);
>  void icc_provider_init(struct icc_provider *provider);
>  int icc_provider_register(struct icc_provider *provider);
>  void icc_provider_deregister(struct icc_provider *provider);
> -int icc_provider_add(struct icc_provider *provider);
> -void icc_provider_del(struct icc_provider *provider);
>  struct icc_node_data *of_icc_get_from_provider(struct of_phandle_args *spec);
>  void icc_sync_state(struct device *dev);
>  
> @@ -179,15 +177,6 @@ static inline int icc_provider_register(struct icc_provider *provider)
>  
>  static inline void icc_provider_deregister(struct icc_provider *provider) { }
>  
> -static inline int icc_provider_add(struct icc_provider *provider)
> -{
> -	return -ENOTSUPP;
> -}
> -
> -static inline void icc_provider_del(struct icc_provider *provider)
> -{
> -}
> -
>  static inline struct icc_node_data *of_icc_get_from_provider(struct of_phandle_args *spec)
>  {
>  	return ERR_PTR(-ENOTSUPP);
Jun Nie Feb. 3, 2023, 4:06 a.m. UTC | #17
Johan Hovold <johan+linaro@kernel.org> 于2023年2月1日周三 18:16写道:
>
> The current interconnect provider registration interface is inherently
> racy as nodes are not added until the after adding the provider. This
> can specifically cause racing DT lookups to fail.
>
> Switch to using the new API where the provider is not registered until
> after it has been fully initialised.
>
> Fixes: 62feb14ee8a3 ("interconnect: qcom: Consolidate interconnect RPM support")
> Fixes: 30c8fa3ec61a ("interconnect: qcom: Add MSM8916 interconnect provider driver")
> Cc: stable@vger.kernel.org      # 5.7
> Cc: Jun Nie <jun.nie@linaro.org>
> Cc: Georgi Djakov <georgi.djakov@linaro.org>
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> ---
>  drivers/interconnect/qcom/icc-rpm.c | 23 ++++++++++++-----------
>  1 file changed, 12 insertions(+), 11 deletions(-)
>
Reviewed-by: Jun Nie <jun.nie@linaro.org>
Luca Ceresoli Feb. 3, 2023, 4:01 p.m. UTC | #18
Hello Johan,

On Wed,  1 Feb 2023 11:15:40 +0100
Johan Hovold <johan+linaro@kernel.org> wrote:

> The current interconnect provider registration interface is inherently
> racy as nodes are not added until the after adding the provider. This
> can specifically cause racing DT lookups to fail.
> 
> Switch to using the new API where the provider is not registered until
> after it has been fully initialised.
> 
> Fixes: f0d8048525d7 ("interconnect: Add imx core driver")
> Cc: stable@vger.kernel.org      # 5.8
> Cc: Leonard Crestez <leonard.crestez@nxp.com>
> Cc: Alexandre Bailon <abailon@baylibre.com>
> Signed-off-by: Johan Hovold <johan+linaro@kernel.org>

Georgi pointed me to this series after I reported a bug yesterday [0],
that I found on iMX8MP. So I ran some tests with my original, failing
tree, minus one patch with my debugging code to hunt for the bug, plus
patches 1-4 of this series.

The original code was failing approx 5~10% of the times. With your 4
patches applied it ran 139 times with zero errors, which looks great! I
won't be able to do more testing until next Monday to be extra sure.

[0]
https://lore.kernel.org/linux-arm-kernel/20230202175525.3dba79a7@booty/T/#u
Johan Hovold Feb. 6, 2023, 8:09 a.m. UTC | #19
On Fri, Feb 03, 2023 at 05:01:21PM +0100, Luca Ceresoli wrote:
> Hello Johan,
> 
> On Wed,  1 Feb 2023 11:15:40 +0100
> Johan Hovold <johan+linaro@kernel.org> wrote:
> 
> > The current interconnect provider registration interface is inherently
> > racy as nodes are not added until the after adding the provider. This
> > can specifically cause racing DT lookups to fail.
> > 
> > Switch to using the new API where the provider is not registered until
> > after it has been fully initialised.
> > 
> > Fixes: f0d8048525d7 ("interconnect: Add imx core driver")
> > Cc: stable@vger.kernel.org      # 5.8
> > Cc: Leonard Crestez <leonard.crestez@nxp.com>
> > Cc: Alexandre Bailon <abailon@baylibre.com>
> > Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
> 
> Georgi pointed me to this series after I reported a bug yesterday [0],
> that I found on iMX8MP. So I ran some tests with my original, failing
> tree, minus one patch with my debugging code to hunt for the bug, plus
> patches 1-4 of this series.
> 
> The original code was failing approx 5~10% of the times. With your 4
> patches applied it ran 139 times with zero errors, which looks great! I
> won't be able to do more testing until next Monday to be extra sure.

Thanks for testing.

It indeed looks like you're hitting the same race, and as the imx
interconnect driver also initialises the provider data num_nodes count
before adding the nodes it results in that NULL-deref (where the qcom
driver failed a bit more gracefully).

Johan

> [0]
> https://lore.kernel.org/linux-arm-kernel/20230202175525.3dba79a7@booty/T/#u