diff mbox series

[v2] drm/msm: add null checks for drm device to avoid crash during probe defer

Message ID 1654249343-24959-1-git-send-email-quic_vpolimer@quicinc.com
State New
Headers show
Series [v2] drm/msm: add null checks for drm device to avoid crash during probe defer | expand

Commit Message

Vinod Polimera June 3, 2022, 9:42 a.m. UTC
During probe defer, drm device is not initialized and an external
trigger to shutdown is trying to clean up drm device leading to crash.
Add checks to avoid drm device cleanup in such cases.

BUG: unable to handle kernel NULL pointer dereference at virtual
address 00000000000000b8

Call trace:

drm_atomic_helper_shutdown+0x44/0x144
msm_pdev_shutdown+0x2c/0x38
platform_shutdown+0x2c/0x38
device_shutdown+0x158/0x210
kernel_restart_prepare+0x40/0x4c
kernel_restart+0x20/0x6c
__arm64_sys_reboot+0x194/0x23c
invoke_syscall+0x50/0x13c
el0_svc_common+0xa0/0x17c
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x20/0x70
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x1a8/0x1ac

Changes in v2:
- Add fixes tag.

Fixes: 623f279c778 ("drm/msm: fix shutdown hook in case GPU components failed to bind")
Signed-off-by: Vinod Polimera <quic_vpolimer@quicinc.com>
---
 drivers/gpu/drm/msm/msm_drv.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Dmitry Baryshkov June 15, 2022, 12:20 p.m. UTC | #1
On 03/06/2022 12:42, Vinod Polimera wrote:
> During probe defer, drm device is not initialized and an external
> trigger to shutdown is trying to clean up drm device leading to crash.
> Add checks to avoid drm device cleanup in such cases.
> 
> BUG: unable to handle kernel NULL pointer dereference at virtual
> address 00000000000000b8
> 
> Call trace:
> 
> drm_atomic_helper_shutdown+0x44/0x144
> msm_pdev_shutdown+0x2c/0x38
> platform_shutdown+0x2c/0x38
> device_shutdown+0x158/0x210
> kernel_restart_prepare+0x40/0x4c
> kernel_restart+0x20/0x6c
> __arm64_sys_reboot+0x194/0x23c
> invoke_syscall+0x50/0x13c
> el0_svc_common+0xa0/0x17c
> do_el0_svc_compat+0x28/0x34
> el0_svc_compat+0x20/0x70
> el0t_32_sync_handler+0xa8/0xcc
> el0t_32_sync+0x1a8/0x1ac
> 
> Changes in v2:
> - Add fixes tag.
> 
> Fixes: 623f279c778 ("drm/msm: fix shutdown hook in case GPU components failed to bind")
> Signed-off-by: Vinod Polimera <quic_vpolimer@quicinc.com>

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>


> ---
>   drivers/gpu/drm/msm/msm_drv.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> index 4448536..d62ac66 100644
> --- a/drivers/gpu/drm/msm/msm_drv.c
> +++ b/drivers/gpu/drm/msm/msm_drv.c
> @@ -142,6 +142,9 @@ static void msm_irq_uninstall(struct drm_device *dev)
>   	struct msm_drm_private *priv = dev->dev_private;
>   	struct msm_kms *kms = priv->kms;
>   
> +	if (!irq_has_action(kms->irq))
> +		return;
> +
>   	kms->funcs->irq_uninstall(kms);
>   	if (kms->irq_requested)
>   		free_irq(kms->irq, dev);
> @@ -259,6 +262,7 @@ static int msm_drm_uninit(struct device *dev)
>   
>   	ddev->dev_private = NULL;
>   	drm_dev_put(ddev);
> +	priv->dev = NULL;
>   
>   	destroy_workqueue(priv->wq);
>   
> @@ -1167,7 +1171,7 @@ void msm_drv_shutdown(struct platform_device *pdev)
>   	struct msm_drm_private *priv = platform_get_drvdata(pdev);
>   	struct drm_device *drm = priv ? priv->dev : NULL;
>   
> -	if (!priv || !priv->kms)
> +	if (!priv || !priv->kms || !drm)
>   		return;
>   
>   	drm_atomic_helper_shutdown(drm);
Dmitry Baryshkov June 15, 2022, 12:23 p.m. UTC | #2
On 03/06/2022 12:42, Vinod Polimera wrote:
> During probe defer, drm device is not initialized and an external
> trigger to shutdown is trying to clean up drm device leading to crash.
> Add checks to avoid drm device cleanup in such cases.
> 
> BUG: unable to handle kernel NULL pointer dereference at virtual
> address 00000000000000b8
> 
> Call trace:
> 
> drm_atomic_helper_shutdown+0x44/0x144
> msm_pdev_shutdown+0x2c/0x38
> platform_shutdown+0x2c/0x38
> device_shutdown+0x158/0x210
> kernel_restart_prepare+0x40/0x4c
> kernel_restart+0x20/0x6c
> __arm64_sys_reboot+0x194/0x23c
> invoke_syscall+0x50/0x13c
> el0_svc_common+0xa0/0x17c
> do_el0_svc_compat+0x28/0x34
> el0_svc_compat+0x20/0x70
> el0t_32_sync_handler+0xa8/0xcc
> el0t_32_sync+0x1a8/0x1ac
> 
> Changes in v2:
> - Add fixes tag.
> 
> Fixes: 623f279c778 ("drm/msm: fix shutdown hook in case GPU components failed to bind")
> Signed-off-by: Vinod Polimera <quic_vpolimer@quicinc.com>
> ---
>   drivers/gpu/drm/msm/msm_drv.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> index 4448536..d62ac66 100644
> --- a/drivers/gpu/drm/msm/msm_drv.c
> +++ b/drivers/gpu/drm/msm/msm_drv.c
> @@ -142,6 +142,9 @@ static void msm_irq_uninstall(struct drm_device *dev)
>   	struct msm_drm_private *priv = dev->dev_private;
>   	struct msm_kms *kms = priv->kms;
>   
> +	if (!irq_has_action(kms->irq))
> +		return;

As a second thought I'd still prefer a variable here. irq_has_action 
would check that there is _any_ IRQ handler for this IRQ. While we do 
not have anybody sharing this IRQ, I'd prefer to be clear here, that we 
do not want to uninstall our IRQ handler rather than any IRQ handler.

> +
>   	kms->funcs->irq_uninstall(kms);
>   	if (kms->irq_requested)
>   		free_irq(kms->irq, dev);
> @@ -259,6 +262,7 @@ static int msm_drm_uninit(struct device *dev)
>   
>   	ddev->dev_private = NULL;
>   	drm_dev_put(ddev);
> +	priv->dev = NULL;
>   
>   	destroy_workqueue(priv->wq);
>   
> @@ -1167,7 +1171,7 @@ void msm_drv_shutdown(struct platform_device *pdev)
>   	struct msm_drm_private *priv = platform_get_drvdata(pdev);
>   	struct drm_device *drm = priv ? priv->dev : NULL;
>   
> -	if (!priv || !priv->kms)
> +	if (!priv || !priv->kms || !drm)
>   		return;
>   
>   	drm_atomic_helper_shutdown(drm);
Dmitry Baryshkov Aug. 26, 2022, 8:41 a.m. UTC | #3
On 15/06/2022 15:23, Dmitry Baryshkov wrote:
> On 03/06/2022 12:42, Vinod Polimera wrote:
>> During probe defer, drm device is not initialized and an external
>> trigger to shutdown is trying to clean up drm device leading to crash.
>> Add checks to avoid drm device cleanup in such cases.
>>
>> BUG: unable to handle kernel NULL pointer dereference at virtual
>> address 00000000000000b8
>>
>> Call trace:
>>
>> drm_atomic_helper_shutdown+0x44/0x144
>> msm_pdev_shutdown+0x2c/0x38
>> platform_shutdown+0x2c/0x38
>> device_shutdown+0x158/0x210
>> kernel_restart_prepare+0x40/0x4c
>> kernel_restart+0x20/0x6c
>> __arm64_sys_reboot+0x194/0x23c
>> invoke_syscall+0x50/0x13c
>> el0_svc_common+0xa0/0x17c
>> do_el0_svc_compat+0x28/0x34
>> el0_svc_compat+0x20/0x70
>> el0t_32_sync_handler+0xa8/0xcc
>> el0t_32_sync+0x1a8/0x1ac
>>
>> Changes in v2:
>> - Add fixes tag.
>>
>> Fixes: 623f279c778 ("drm/msm: fix shutdown hook in case GPU components 
>> failed to bind")
>> Signed-off-by: Vinod Polimera <quic_vpolimer@quicinc.com>
>> ---
>>   drivers/gpu/drm/msm/msm_drv.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/msm/msm_drv.c 
>> b/drivers/gpu/drm/msm/msm_drv.c
>> index 4448536..d62ac66 100644
>> --- a/drivers/gpu/drm/msm/msm_drv.c
>> +++ b/drivers/gpu/drm/msm/msm_drv.c
>> @@ -142,6 +142,9 @@ static void msm_irq_uninstall(struct drm_device *dev)
>>       struct msm_drm_private *priv = dev->dev_private;
>>       struct msm_kms *kms = priv->kms;
>> +    if (!irq_has_action(kms->irq))
>> +        return;
> 
> As a second thought I'd still prefer a variable here. irq_has_action 
> would check that there is _any_ IRQ handler for this IRQ. While we do 
> not have anybody sharing this IRQ, I'd prefer to be clear here, that we 
> do not want to uninstall our IRQ handler rather than any IRQ handler.

Vinod, do we still want to pursue this fix? If so, could you please 
update it according to the comment.

> 
>> +
>>       kms->funcs->irq_uninstall(kms);
>>       if (kms->irq_requested)
>>           free_irq(kms->irq, dev);
>> @@ -259,6 +262,7 @@ static int msm_drm_uninit(struct device *dev)
>>       ddev->dev_private = NULL;
>>       drm_dev_put(ddev);
>> +    priv->dev = NULL;
>>       destroy_workqueue(priv->wq);
>> @@ -1167,7 +1171,7 @@ void msm_drv_shutdown(struct platform_device *pdev)
>>       struct msm_drm_private *priv = platform_get_drvdata(pdev);
>>       struct drm_device *drm = priv ? priv->dev : NULL;
>> -    if (!priv || !priv->kms)
>> +    if (!priv || !priv->kms || !drm)
>>           return;
>>       drm_atomic_helper_shutdown(drm);
> 
>
Vinod Polimera Sept. 27, 2022, 7:31 a.m. UTC | #4
> -----Original Message-----
> From: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> Sent: Friday, August 26, 2022 2:11 PM
> To: Vinod Polimera (QUIC) <quic_vpolimer@quicinc.com>; dri-
> devel@lists.freedesktop.org; linux-arm-msm@vger.kernel.org;
> freedreno@lists.freedesktop.org; devicetree@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org; robdclark@gmail.com;
> dianders@chromium.org; vpolimer@quicinc.com; swboyd@chromium.org;
> kalyant@quicinc.com
> Subject: Re: [v2] drm/msm: add null checks for drm device to avoid crash
> during probe defer
> 
> WARNING: This email originated from outside of Qualcomm. Please be wary
> of any links or attachments, and do not enable macros.
> 
> On 15/06/2022 15:23, Dmitry Baryshkov wrote:
> > On 03/06/2022 12:42, Vinod Polimera wrote:
> >> During probe defer, drm device is not initialized and an external
> >> trigger to shutdown is trying to clean up drm device leading to crash.
> >> Add checks to avoid drm device cleanup in such cases.
> >>
> >> BUG: unable to handle kernel NULL pointer dereference at virtual
> >> address 00000000000000b8
> >>
> >> Call trace:
> >>
> >> drm_atomic_helper_shutdown+0x44/0x144
> >> msm_pdev_shutdown+0x2c/0x38
> >> platform_shutdown+0x2c/0x38
> >> device_shutdown+0x158/0x210
> >> kernel_restart_prepare+0x40/0x4c
> >> kernel_restart+0x20/0x6c
> >> __arm64_sys_reboot+0x194/0x23c
> >> invoke_syscall+0x50/0x13c
> >> el0_svc_common+0xa0/0x17c
> >> do_el0_svc_compat+0x28/0x34
> >> el0_svc_compat+0x20/0x70
> >> el0t_32_sync_handler+0xa8/0xcc
> >> el0t_32_sync+0x1a8/0x1ac
> >>
> >> Changes in v2:
> >> - Add fixes tag.
> >>
> >> Fixes: 623f279c778 ("drm/msm: fix shutdown hook in case GPU
> components
> >> failed to bind")
> >> Signed-off-by: Vinod Polimera <quic_vpolimer@quicinc.com>
> >> ---
> >>   drivers/gpu/drm/msm/msm_drv.c | 6 +++++-
> >>   1 file changed, 5 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/msm/msm_drv.c
> >> b/drivers/gpu/drm/msm/msm_drv.c
> >> index 4448536..d62ac66 100644
> >> --- a/drivers/gpu/drm/msm/msm_drv.c
> >> +++ b/drivers/gpu/drm/msm/msm_drv.c
> >> @@ -142,6 +142,9 @@ static void msm_irq_uninstall(struct drm_device
> *dev)
> >>       struct msm_drm_private *priv = dev->dev_private;
> >>       struct msm_kms *kms = priv->kms;
> >> +    if (!irq_has_action(kms->irq))
> >> +        return;
> >
> > As a second thought I'd still prefer a variable here. irq_has_action
> > would check that there is _any_ IRQ handler for this IRQ. While we do
> > not have anybody sharing this IRQ, I'd prefer to be clear here, that we
> > do not want to uninstall our IRQ handler rather than any IRQ handler.
> 
> Vinod, do we still want to pursue this fix? If so, could you please
> update it according to the comment.
>
I have looked up and found many kernel drivers are using Irq_has_action to see if the interrupt is requested, it appears to me as an aggregable way of doing it. Having a variable to track the state seems unnecessary as it needs to be managed race free. let me know your views on it.
> >
> >> +
> >>       kms->funcs->irq_uninstall(kms);
> >>       if (kms->irq_requested)
> >>           free_irq(kms->irq, dev);
> >> @@ -259,6 +262,7 @@ static int msm_drm_uninit(struct device *dev)
> >>       ddev->dev_private = NULL;
> >>       drm_dev_put(ddev);
> >> +    priv->dev = NULL;
> >>       destroy_workqueue(priv->wq);
> >> @@ -1167,7 +1171,7 @@ void msm_drv_shutdown(struct
> platform_device *pdev)
> >>       struct msm_drm_private *priv = platform_get_drvdata(pdev);
> >>       struct drm_device *drm = priv ? priv->dev : NULL;
> >> -    if (!priv || !priv->kms)
> >> +    if (!priv || !priv->kms || !drm)
> >>           return;
> >>       drm_atomic_helper_shutdown(drm);
> >
> >
> 
> --
> With best wishes
> Dmitry

- Vinod P.
Javier Martinez Canillas Sept. 27, 2022, 9:02 a.m. UTC | #5
Hello Vinod and Dmitry,

On 9/27/22 09:31, Vinod Polimera wrote:
>> -----Original Message-----
>> From: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
>> Sent: Friday, August 26, 2022 2:11 PM
>> To: Vinod Polimera (QUIC) <quic_vpolimer@quicinc.com>; dri-
>> devel@lists.freedesktop.org; linux-arm-msm@vger.kernel.org;
>> freedreno@lists.freedesktop.org; devicetree@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org; robdclark@gmail.com;
>> dianders@chromium.org; vpolimer@quicinc.com; swboyd@chromium.org;
>> kalyant@quicinc.com
>> Subject: Re: [v2] drm/msm: add null checks for drm device to avoid crash
>> during probe defer
>>

[...]

>> Vinod, do we still want to pursue this fix? If so, could you please
>> update it according to the comment.
>>

I don't think this patch is needed anymore, since AFAICT the issue has
been fixed by commit 0a58d2ae572a ("drm/msm: Make .remove and .shutdown
HW shutdown consistent") which is already in the drm/drm-next branch.
Vinod Polimera Sept. 27, 2022, 9:41 a.m. UTC | #6
> -----Original Message-----
> From: Javier Martinez Canillas <javierm@redhat.com>
> Sent: Tuesday, September 27, 2022 2:33 PM
> To: Vinod Polimera <vpolimer@qti.qualcomm.com>;
> dmitry.baryshkov@linaro.org; Vinod Polimera (QUIC)
> <quic_vpolimer@quicinc.com>; dri-devel@lists.freedesktop.org; linux-arm-
> msm@vger.kernel.org; freedreno@lists.freedesktop.org;
> devicetree@vger.kernel.org
> Cc: dianders@chromium.org; vpolimer@quicinc.com; Abhinav Kumar
> <abhinavk@quicinc.com>; linux-kernel@vger.kernel.org;
> swboyd@chromium.org; kalyant@quicinc.com
> Subject: Re: [v2] drm/msm: add null checks for drm device to avoid crash
> during probe defer
> 
> WARNING: This email originated from outside of Qualcomm. Please be wary
> of any links or attachments, and do not enable macros.
> 
> Hello Vinod and Dmitry,
> 
> On 9/27/22 09:31, Vinod Polimera wrote:
> >> -----Original Message-----
> >> From: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> >> Sent: Friday, August 26, 2022 2:11 PM
> >> To: Vinod Polimera (QUIC) <quic_vpolimer@quicinc.com>; dri-
> >> devel@lists.freedesktop.org; linux-arm-msm@vger.kernel.org;
> >> freedreno@lists.freedesktop.org; devicetree@vger.kernel.org
> >> Cc: linux-kernel@vger.kernel.org; robdclark@gmail.com;
> >> dianders@chromium.org; vpolimer@quicinc.com;
> swboyd@chromium.org;
> >> kalyant@quicinc.com
> >> Subject: Re: [v2] drm/msm: add null checks for drm device to avoid crash
> >> during probe defer
> >>
> 
> [...]
> 
> >> Vinod, do we still want to pursue this fix? If so, could you please
> >> update it according to the comment.
> >>
> 
> I don't think this patch is needed anymore, since AFAICT the issue has
> been fixed by commit 0a58d2ae572a ("drm/msm: Make .remove and
> .shutdown
> HW shutdown consistent") which is already in the drm/drm-next branch.
Yes , Issue will be fixed with the  commit 0a58d2ae572a ("drm/msm: Make .remove and .shutdown) . Hence  we can  drop this patch.
> 
> --
> Best regards,
> 
> Javier Martinez Canillas
> Core Platforms
> Red Hat

- Vinod P.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 4448536..d62ac66 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -142,6 +142,9 @@  static void msm_irq_uninstall(struct drm_device *dev)
 	struct msm_drm_private *priv = dev->dev_private;
 	struct msm_kms *kms = priv->kms;
 
+	if (!irq_has_action(kms->irq))
+		return;
+
 	kms->funcs->irq_uninstall(kms);
 	if (kms->irq_requested)
 		free_irq(kms->irq, dev);
@@ -259,6 +262,7 @@  static int msm_drm_uninit(struct device *dev)
 
 	ddev->dev_private = NULL;
 	drm_dev_put(ddev);
+	priv->dev = NULL;
 
 	destroy_workqueue(priv->wq);
 
@@ -1167,7 +1171,7 @@  void msm_drv_shutdown(struct platform_device *pdev)
 	struct msm_drm_private *priv = platform_get_drvdata(pdev);
 	struct drm_device *drm = priv ? priv->dev : NULL;
 
-	if (!priv || !priv->kms)
+	if (!priv || !priv->kms || !drm)
 		return;
 
 	drm_atomic_helper_shutdown(drm);