diff mbox series

[2/7] net: ethernet: ti: cpsw: Don't error out in .remove()

Message ID 20231117091655.872426-3-u.kleine-koenig@pengutronix.de
State New
Headers show
Series net: ethernet: Convert to platform remove callback | expand

Commit Message

Uwe Kleine-König Nov. 17, 2023, 9:16 a.m. UTC
Returning early from .remove() with an error code still results in the
driver unbinding the device. So the driver core ignores the returned error
code and the resources that were not freed are never catched up. In
combination with devm this also often results in use-after-free bugs.

If runtime resume fails, it's still important to free all resources, so
don't return with an error code, but emit an error message and continue
freeing acquired stuff.

This prepares changing cpsw_remove() to return void.

Fixes: 8a0b6dc958fd ("drivers: net: cpsw: fix wrong regs access in cpsw_remove")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
---
 drivers/net/ethernet/ti/cpsw.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

Comments

Roger Quadros Nov. 18, 2023, 10 a.m. UTC | #1
On 17/11/2023 11:16, Uwe Kleine-König wrote:
> Returning early from .remove() with an error code still results in the
> driver unbinding the device. So the driver core ignores the returned error
> code and the resources that were not freed are never catched up. In
> combination with devm this also often results in use-after-free bugs.
> 
> If runtime resume fails, it's still important to free all resources, so
> don't return with an error code, but emit an error message and continue
> freeing acquired stuff.
> 
> This prepares changing cpsw_remove() to return void.
> 
> Fixes: 8a0b6dc958fd ("drivers: net: cpsw: fix wrong regs access in cpsw_remove")
> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> ---
>  drivers/net/ethernet/ti/cpsw.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
> index ca4d4548f85e..db5a2ba8a6d4 100644
> --- a/drivers/net/ethernet/ti/cpsw.c
> +++ b/drivers/net/ethernet/ti/cpsw.c
> @@ -1727,16 +1727,24 @@ static int cpsw_remove(struct platform_device *pdev)
>  	struct cpsw_common *cpsw = platform_get_drvdata(pdev);
>  	int i, ret;
>  
> -	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	ret = pm_runtime_get_sync(&pdev->dev);
>  	if (ret < 0)
> -		return ret;
> +		/* There is no need to do something about that. The important
> +		 * thing is to not exit early, but do all cleanup that doesn't
> +		 * require register access.
> +		 */
> +		dev_err(&pdev->dev, "runtime resume failed (%pe)\n",
> +			ERR_PTR(ret));
>  
>  	for (i = 0; i < cpsw->data.slaves; i++)
>  		if (cpsw->slaves[i].ndev)
>  			unregister_netdev(cpsw->slaves[i].ndev);
>  
> -	cpts_release(cpsw->cpts);
> -	cpdma_ctlr_destroy(cpsw->dma);
> +	if (ret >= 0) {
> +		cpts_release(cpsw->cpts);

cpts_release() only does clk_unprepare().
Why not do that in the error path as well?

> +		cpdma_ctlr_destroy(cpsw->dma);

cpdma_ctrl_destroy() not only stops the DMA controller
but also frees up the channel and calls dma_free_coherent?

We still want to free up the channel and dma_free_coherent in the
error path?

> +	}
> +
>  	cpsw_remove_dt(pdev);
>  	pm_runtime_put_sync(&pdev->dev);
>  	pm_runtime_disable(&pdev->dev);
Uwe Kleine-König Nov. 19, 2023, 9:40 a.m. UTC | #2
Hello Roger,

[dropping Mugunthan V N from Cc, their address bounces, and adding the
net maintainers that I failed to do for the original submission]

On Sat, Nov 18, 2023 at 12:00:08PM +0200, Roger Quadros wrote:
> > -	cpts_release(cpsw->cpts);
> > -	cpdma_ctlr_destroy(cpsw->dma);
> > +	if (ret >= 0) {
> > +		cpts_release(cpsw->cpts);
> 
> cpts_release() only does clk_unprepare().
> Why not do that in the error path as well?

I thought I saw the pm suspend do a clk_unprepare, but I don't find
that. Indeed this step shouldn't be skipped.

> > +		cpdma_ctlr_destroy(cpsw->dma);
> 
> cpdma_ctrl_destroy() not only stops the DMA controller
> but also frees up the channel and calls dma_free_coherent?
> 
> We still want to free up the channel and dma_free_coherent in the
> error path?

Then we need to split the function and only do the software part. I
don't have hardware to test this change and getting it right without
testing seems to be hard.

May I suggest that only do the conversion to remove_new (that doesn't
modify the resource leak behaviour) and you care for fixing the leaks?

My change would then just look as follows:

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index ca4d4548f85e..1251160b563b 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1722,14 +1722,16 @@ static int cpsw_probe(struct platform_device *pdev)
 	return ret;
 }
 
-static int cpsw_remove(struct platform_device *pdev)
+static void cpsw_remove(struct platform_device *pdev)
 {
 	struct cpsw_common *cpsw = platform_get_drvdata(pdev);
 	int i, ret;
 
 	ret = pm_runtime_resume_and_get(&pdev->dev);
-	if (ret < 0)
-		return ret;
+	if (ret < 0) {
+		dev_err(&pdev->dev, "Failed to resume device (%pe)\n",
+			ERR_PTR(ret))
+	}
 
 	for (i = 0; i < cpsw->data.slaves; i++)
 		if (cpsw->slaves[i].ndev)
@@ -1740,7 +1742,6 @@ static int cpsw_remove(struct platform_device *pdev)
 	cpsw_remove_dt(pdev);
 	pm_runtime_put_sync(&pdev->dev);
 	pm_runtime_disable(&pdev->dev);
-	return 0;
 }
 
 #ifdef CONFIG_PM_SLEEP
@@ -1795,7 +1796,7 @@ static struct platform_driver cpsw_driver = {
 		.of_match_table = cpsw_of_mtable,
 	},
 	.probe = cpsw_probe,
-	.remove = cpsw_remove,
+	.remove_new = cpsw_remove,
 };
 
 module_platform_driver(cpsw_driver);

A similar question for the other two cpsw drivers.

Best regards
Uwe
diff mbox series

Patch

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index ca4d4548f85e..db5a2ba8a6d4 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1727,16 +1727,24 @@  static int cpsw_remove(struct platform_device *pdev)
 	struct cpsw_common *cpsw = platform_get_drvdata(pdev);
 	int i, ret;
 
-	ret = pm_runtime_resume_and_get(&pdev->dev);
+	ret = pm_runtime_get_sync(&pdev->dev);
 	if (ret < 0)
-		return ret;
+		/* There is no need to do something about that. The important
+		 * thing is to not exit early, but do all cleanup that doesn't
+		 * require register access.
+		 */
+		dev_err(&pdev->dev, "runtime resume failed (%pe)\n",
+			ERR_PTR(ret));
 
 	for (i = 0; i < cpsw->data.slaves; i++)
 		if (cpsw->slaves[i].ndev)
 			unregister_netdev(cpsw->slaves[i].ndev);
 
-	cpts_release(cpsw->cpts);
-	cpdma_ctlr_destroy(cpsw->dma);
+	if (ret >= 0) {
+		cpts_release(cpsw->cpts);
+		cpdma_ctlr_destroy(cpsw->dma);
+	}
+
 	cpsw_remove_dt(pdev);
 	pm_runtime_put_sync(&pdev->dev);
 	pm_runtime_disable(&pdev->dev);