mbox series

[v2,00/10] drm/msm: probe deferral fixes

Message ID 20220913085320.8577-1-johan+linaro@kernel.org
Headers show
Series drm/msm: probe deferral fixes | expand

Message

Johan Hovold Sept. 13, 2022, 8:53 a.m. UTC
The MSM DRM driver is currently broken in multiple ways with respect to
probe deferral. Not only does the driver currently fail to probe again
after a late deferral, but due to a related use-after-free bug this also
triggers NULL-pointer dereferences.

These bugs are not new but have become critical with the release of
5.19 where probe is deferred in case the aux-bus EP panel driver has not
yet been loaded.

The underlying problem is lifetime issues due to careless use of
device-managed resources.

Specifically, device-managed resources allocated post component bind
must be tied to the lifetime of the aggregate DRM device or they will
not necessarily be released when binding of the aggregate device is
deferred.

The following call chain and pseudo code serves as an illustration of
the problem:

 - platform_probe(pdev1)
   - dp_display_probe()
     - component_add()

 - platform_probe(pdev2) 				// last component
   - dp_display_probe()					// d0
       - component_add()
         - try_to_bring_up_aggregate_device()
	   - devres_open_group(adev->parent)		// d1

	   - msm_drm_bind()
	     - msm_drm_init()
	       - component_bind_all()
	         - for_each_component()
		   - component_bind()
		     - devres_open_group(&pdev->dev)	// d2
	             - dp_display_bind()
		       - devm_kzalloc(&pdev->dev)	// a1, OK
		     - devres_close_group(&pdev->dev)	// d3

	       - dpu_kms_hw_init()
	         - for_each_panel()
	           - msm_dp_modeset_init()
		     - dp_display_request_irq()
		       - devm_request_irq(&pdev->dev)	// a2, BUG
		     - if (pdev == pdev2 && condition)
		       - return -EPROBE_DEFER;

	      - if (error)
 	        - component_unbind_all()
	          - for_each_component()
		    - component_unbind()
		      - dp_display_unbind()
		      - devres_release_group(&pdev->dev) // d4, only a1 is freed

           - if (error)
	     - devres_release_group(adev->parent)	// d5

The device-managed allocation a2 is buggy as its lifetime is tied to the
component platform device and will not be released when the aggregate
device bind fails (e.g. due to a probe deferral).

When pdev2 is later probed again, the attempt to allocate the IRQ a
second time will fail for pdev1 (which is still bound to its platform
driver).

This series fixes the lifetime issues by tying the lifetime of a2 (and
similar allocations) to the lifetime of the aggregate device so that a2
is released at d5.

In some cases, such has for the DP IRQ, the above situation can also be
avoided by moving the allocation in question to the platform driver
probe (d0) or component bind (between d2 and d3). But as doing so is not
a general fix, this can be done later as a cleanup/optimisation.

Johan

Changes in v2
 - use a custom devres action instead of amending the AUX bus interface
   (Doug)
 - split sanity check fixes and cleanups per bridge type (Dmitry)
 - add another Fixes tag for the missing bridge counter reset (Dmitry)


Johan Hovold (10):
  drm/msm: fix use-after-free on probe deferral
  drm/msm/dp: fix memory corruption with too many bridges
  drm/msm/dsi: fix memory corruption with too many bridges
  drm/msm/hdmi: fix memory corruption with too many bridges
  drm/msm/dp: fix IRQ lifetime
  drm/msm/dp: fix aux-bus EP lifetime
  drm/msm/dp: fix bridge lifetime
  drm/msm/hdmi: fix IRQ lifetime
  drm/msm/dp: drop modeset sanity checks
  drm/msm/dsi: drop modeset sanity checks

 drivers/gpu/drm/msm/dp/dp_display.c | 26 +++++++++++++++++++-------
 drivers/gpu/drm/msm/dp/dp_parser.c  |  6 +++---
 drivers/gpu/drm/msm/dp/dp_parser.h  |  5 +++--
 drivers/gpu/drm/msm/dsi/dsi.c       |  9 +++++----
 drivers/gpu/drm/msm/hdmi/hdmi.c     |  7 ++++++-
 drivers/gpu/drm/msm/msm_drv.c       |  1 +
 6 files changed, 37 insertions(+), 17 deletions(-)

Comments

Steev Klimaszewski Sept. 13, 2022, 8:23 p.m. UTC | #1
Hi Johan,

On 9/13/22 3:53 AM, Johan Hovold wrote:
> The MSM DRM driver is currently broken in multiple ways with respect to
> probe deferral. Not only does the driver currently fail to probe again
> after a late deferral, but due to a related use-after-free bug this also
> triggers NULL-pointer dereferences.
>
> These bugs are not new but have become critical with the release of
> 5.19 where probe is deferred in case the aux-bus EP panel driver has not
> yet been loaded.
>
> The underlying problem is lifetime issues due to careless use of
> device-managed resources.
>
> Specifically, device-managed resources allocated post component bind
> must be tied to the lifetime of the aggregate DRM device or they will
> not necessarily be released when binding of the aggregate device is
> deferred.
>
> The following call chain and pseudo code serves as an illustration of
> the problem:
>
>   - platform_probe(pdev1)
>     - dp_display_probe()
>       - component_add()
>
>   - platform_probe(pdev2) 				// last component
>     - dp_display_probe()					// d0
>         - component_add()
>           - try_to_bring_up_aggregate_device()
> 	   - devres_open_group(adev->parent)		// d1
>
> 	   - msm_drm_bind()
> 	     - msm_drm_init()
> 	       - component_bind_all()
> 	         - for_each_component()
> 		   - component_bind()
> 		     - devres_open_group(&pdev->dev)	// d2
> 	             - dp_display_bind()
> 		       - devm_kzalloc(&pdev->dev)	// a1, OK
> 		     - devres_close_group(&pdev->dev)	// d3
>
> 	       - dpu_kms_hw_init()
> 	         - for_each_panel()
> 	           - msm_dp_modeset_init()
> 		     - dp_display_request_irq()
> 		       - devm_request_irq(&pdev->dev)	// a2, BUG
> 		     - if (pdev == pdev2 && condition)
> 		       - return -EPROBE_DEFER;
>
> 	      - if (error)
>   	        - component_unbind_all()
> 	          - for_each_component()
> 		    - component_unbind()
> 		      - dp_display_unbind()
> 		      - devres_release_group(&pdev->dev) // d4, only a1 is freed
>
>             - if (error)
> 	     - devres_release_group(adev->parent)	// d5
>
> The device-managed allocation a2 is buggy as its lifetime is tied to the
> component platform device and will not be released when the aggregate
> device bind fails (e.g. due to a probe deferral).
>
> When pdev2 is later probed again, the attempt to allocate the IRQ a
> second time will fail for pdev1 (which is still bound to its platform
> driver).
>
> This series fixes the lifetime issues by tying the lifetime of a2 (and
> similar allocations) to the lifetime of the aggregate device so that a2
> is released at d5.
>
> In some cases, such has for the DP IRQ, the above situation can also be
> avoided by moving the allocation in question to the platform driver
> probe (d0) or component bind (between d2 and d3). But as doing so is not
> a general fix, this can be done later as a cleanup/optimisation.
>
> Johan
>
> Changes in v2
>   - use a custom devres action instead of amending the AUX bus interface
>     (Doug)
>   - split sanity check fixes and cleanups per bridge type (Dmitry)
>   - add another Fixes tag for the missing bridge counter reset (Dmitry)
>
>
> Johan Hovold (10):
>    drm/msm: fix use-after-free on probe deferral
>    drm/msm/dp: fix memory corruption with too many bridges
>    drm/msm/dsi: fix memory corruption with too many bridges
>    drm/msm/hdmi: fix memory corruption with too many bridges
>    drm/msm/dp: fix IRQ lifetime
>    drm/msm/dp: fix aux-bus EP lifetime
>    drm/msm/dp: fix bridge lifetime
>    drm/msm/hdmi: fix IRQ lifetime
>    drm/msm/dp: drop modeset sanity checks
>    drm/msm/dsi: drop modeset sanity checks
>
>   drivers/gpu/drm/msm/dp/dp_display.c | 26 +++++++++++++++++++-------
>   drivers/gpu/drm/msm/dp/dp_parser.c  |  6 +++---
>   drivers/gpu/drm/msm/dp/dp_parser.h  |  5 +++--
>   drivers/gpu/drm/msm/dsi/dsi.c       |  9 +++++----
>   drivers/gpu/drm/msm/hdmi/hdmi.c     |  7 ++++++-
>   drivers/gpu/drm/msm/msm_drv.c       |  1 +
>   6 files changed, 37 insertions(+), 17 deletions(-)
>
I've tested this on both sc8180x (Lenovo Flex 5G) and sdm850 (Lenovo 
Yoga C630), and both of them show the same issue:

[ 7.449305] platform ae9a000.displayport-controller: Fixing up cyclic 
dependency with ae01000.mdp [ 7.454344] Unable to handle kernel NULL 
pointer dereference at virtual address 0000000000000008 [ 7.454406] Mem 
abort info: [ 7.454423] ESR = 0x0000000096000004 [ 7.454446] EC = 0x25: 
DABT (current EL), IL = 32 bits [ 7.454475] SET = 0, FnV = 0 [ 7.454494] 
EA = 0, S1PTW = 0 [ 7.454512] FSC = 0x04: level 0 translation fault [ 
7.454539] Data abort info: [ 7.454556] ISV = 0, ISS = 0x00000004 [ 
7.454577] CM = 0, WnR = 0 [ 7.454595] user pgtable: 4k pages, 48-bit 
VAs, pgdp=0000000101504000 [ 7.454629] [0000000000000008] 
pgd=0000000000000000, p4d=0000000000000000 [ 7.454669] Internal error: 
Oops: 96000004 [#1] PREEMPT SMP [ 7.454700] Modules linked in: 
i2c_hid_of i2c_hid leds_qcom_lpg led_class_multicolor rtc_pm8xxx msm 
mdt_loader gpu_sched drm_dp_aux_bus drm_display_helper drm_kms_helper 
drm phy_qcom_edp llcc_qcom i2c_qcom_geni phy_qcom_qmp_combo 
phy_qcom_snps_femto_v2 phy_qcom_qmp_ufs phy_qcom_qmp_pcie ufs_qcom 
pwm_bl [ 7.454860] CPU: 2 PID: 76 Comm: kworker/u16:2 Not tainted 
5.19.0-rc8-next-20220728 #26 [ 7.454902] Hardware name: LENOVO 
82AK/LNVNB161216, BIOS EACN43WW(V1.15) 09/13/2021 [ 7.454941] Workqueue: 
events_unbound deferred_probe_work_func [ 7.454982] pstate: 40400005 
(nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 7.455020] pc : 
dp_display_request_irq+0x50/0xdc [msm] [ 7.455145] lr : 
dp_display_request_irq+0x2c/0xdc [msm] [ 7.455265] sp : ffff800008c1bb30 
[ 7.455285] x29: ffff800008c1bb30 x28: 0000000000000000 x27: 
0000000000000000 [ 7.455327] x26: ffffc9c918420000 x25: ffffc9c9186ec570 
x24: 000000000000003a [ 7.455368] x23: ffffc9c918811d30 x22: 
ffff2a5806baa998 x21: ffff2a5806ba3410 [ 7.455410] x20: ffff2a5806baa880 
x19: ffff2a5806baa998 x18: ffffffffffffffff [ 7.455451] x17: 
0000000000000038 x16: ffffc9c9164eeb24 x15: ffffffffffffffff [ 7.455492] 
x14: ffff2a5806bc3004 x13: ffff2a5806bc3000 x12: 0000000000000000 [ 
7.455533] x11: 0000000000000040 x10: ffffc9c918493080 x9 : 
ffffc9c918493078 [ 7.455574] x8 : ffff2a5800681b88 x7 : 0000000000000000 
x6 : ffff2a5806baa880 [ 7.455616] x5 : ffffc9c8ca2de000 x4 : 
0000000000080004 x3 : 0000000000000000 [ 7.455656] x2 : ffffc9c8ca296000 
x1 : 00000000000000a8 x0 : 0000000000000000 [ 7.455698] Call trace: [ 
7.455714] dp_display_request_irq+0x50/0xdc [msm] [ 7.455834] 
dp_display_probe+0xf8/0x4a4 [msm] [ 7.455950] platform_probe+0x6c/0xc4 [ 
7.455976] really_probe+0xbc/0x2d4 [ 7.455999] 
__driver_probe_device+0x78/0xe0 [ 7.456025] 
driver_probe_device+0x3c/0x13c [ 7.456051] 
__device_attach_driver+0xb8/0x120 [ 7.456077] bus_for_each_drv+0x78/0xd0 
[ 7.456105] __device_attach+0x9c/0x1a0 [ 7.456129] 
device_initial_probe+0x18/0x2c [ 7.456154] bus_probe_device+0x9c/0xa4 [ 
7.456178] deferred_probe_work_func+0x88/0xc0 [ 7.456204] 
process_one_work+0x1d4/0x330 [ 7.456231] worker_thread+0x70/0x42c [ 
7.456255] kthread+0x10c/0x110 [ 7.456278] ret_from_fork+0x10/0x20 [ 
7.456306] Code: aa1403e6 f2a00104 f0000225 f0ffffe2 (f9400400) [ 
7.456341] ---[ end trace 0000000000000000 ]---

This is from the sc8180x, sdm850 is the same call stack, just with 
different addresses.

I do have 
https://lore.kernel.org/all/20220712132258.671263-1-dmitry.baryshkov@linaro.org/ 
applied here which makes the 10th patch not apply cleanly.

It fails actually, but I applied it manually here.
Johan Hovold Sept. 14, 2022, 6:01 a.m. UTC | #2
On Tue, Sep 13, 2022 at 03:23:10PM -0500, Steev Klimaszewski wrote:
> Hi Johan,
> 
> On 9/13/22 3:53 AM, Johan Hovold wrote:
> > The MSM DRM driver is currently broken in multiple ways with respect to
> > probe deferral. Not only does the driver currently fail to probe again
> > after a late deferral, but due to a related use-after-free bug this also
> > triggers NULL-pointer dereferences.

> > In some cases, such has for the DP IRQ, the above situation can also be
> > avoided by moving the allocation in question to the platform driver
> > probe (d0) or component bind (between d2 and d3). But as doing so is not
> > a general fix, this can be done later as a cleanup/optimisation.

> I've tested this on both sc8180x (Lenovo Flex 5G) and sdm850 (Lenovo 
> Yoga C630), and both of them show the same issue:

[ Copied the below from IRC instead as the formatting in your mail was
off. ]

> [    7.449305] platform ae9a000.displayport-controller: Fixing up cyclic dependency with ae01000.mdp
> [    7.454344] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
> [    7.454406] Mem abort info:
> [    7.454423]   ESR = 0x0000000096000004
> [    7.454446]   EC = 0x25: DABT (current EL), IL = 32 bits
> [    7.454475]   SET = 0, FnV = 0
> [    7.454494]   EA = 0, S1PTW = 0
> [    7.454512]   FSC = 0x04: level 0 translation fault
> [    7.454539] Data abort info:
> [    7.454556]   ISV = 0, ISS = 0x00000004
> [    7.454577]   CM = 0, WnR = 0
> [    7.454595] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101504000
> [    7.454629] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
> [    7.454669] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> [    7.454700] Modules linked in: i2c_hid_of i2c_hid leds_qcom_lpg led_class_multicolor rtc_pm8xxx msm mdt_loader gpu_sched drm_dp_aux_bus drm_display_helper drm_kms_helper drm phy_qcom_edp llcc_qcom i2c_qcom_geni phy_qcom_qmp_combo phy_qcom_snps_femto_v2 phy_qcom_qmp_ufs phy_qcom_qmp_pcie ufs_qcom pwm_bl
> [    7.454860] CPU: 2 PID: 76 Comm: kworker/u16:2 Not tainted 5.19.0-rc8-next-20220728 #26
> [    7.454902] Hardware name: LENOVO 82AK/LNVNB161216, BIOS EACN43WW(V1.15) 09/13/2021
> [    7.454941] Workqueue: events_unbound deferred_probe_work_func
> [    7.454982] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    7.455020] pc : dp_display_request_irq+0x50/0xdc [msm]
> [    7.455145] lr : dp_display_request_irq+0x2c/0xdc [msm]
> [    7.455265] sp : ffff800008c1bb30
> [    7.455285] x29: ffff800008c1bb30 x28: 0000000000000000 x27: 0000000000000000
> [    7.455327] x26: ffffc9c918420000 x25: ffffc9c9186ec570 x24: 000000000000003a
> [    7.455368] x23: ffffc9c918811d30 x22: ffff2a5806baa998 x21: ffff2a5806ba3410
> [    7.455410] x20: ffff2a5806baa880 x19: ffff2a5806baa998 x18: ffffffffffffffff
> [    7.455451] x17: 0000000000000038 x16: ffffc9c9164eeb24 x15: ffffffffffffffff
> [    7.455492] x14: ffff2a5806bc3004 x13: ffff2a5806bc3000 x12: 0000000000000000
> [    7.455533] x11: 0000000000000040 x10: ffffc9c918493080 x9 : ffffc9c918493078
> [    7.455574] x8 : ffff2a5800681b88 x7 : 0000000000000000 x6 : ffff2a5806baa880
> [    7.455616] x5 : ffffc9c8ca2de000 x4 : 0000000000080004 x3 : 0000000000000000
> [    7.455656] x2 : ffffc9c8ca296000 x1 : 00000000000000a8 x0 : 0000000000000000
> [    7.455698] Call trace:
> [    7.455714]  dp_display_request_irq+0x50/0xdc [msm]
> [    7.455834]  dp_display_probe+0xf8/0x4a4 [msm]
> [    7.455950]  platform_probe+0x6c/0xc4
> [    7.455976]  really_probe+0xbc/0x2d4
> [    7.455999]  __driver_probe_device+0x78/0xe0
> [    7.456025]  driver_probe_device+0x3c/0x13c
> [    7.456051]  __device_attach_driver+0xb8/0x120
> [    7.456077]  bus_for_each_drv+0x78/0xd0
> [    7.456105]  __device_attach+0x9c/0x1a0
> [    7.456129]  device_initial_probe+0x18/0x2c
> [    7.456154]  bus_probe_device+0x9c/0xa4
> [    7.456178]  deferred_probe_work_func+0x88/0xc0
> [    7.456204]  process_one_work+0x1d4/0x330
> [    7.456231]  worker_thread+0x70/0x42c
> [    7.456255]  kthread+0x10c/0x110
> [    7.456278]  ret_from_fork+0x10/0x20
> [    7.456306] Code: aa1403e6 f2a00104 f0000225 f0ffffe2 (f9400400)
> [    7.456341] ---[ end trace 0000000000000000 ]---

> This is from the sc8180x, sdm850 is the same call stack, just with 
> different addresses.
> 
> I do have 
> https://lore.kernel.org/all/20220712132258.671263-1-dmitry.baryshkov@linaro.org/ 
> applied here which makes the 10th patch not apply cleanly.

Yeah, that is expected. You need to drop Dmitry's series first. Once you
verified that this series works, you can add it back if you want but you
then need to restore the device pointer used when allocating the irq in
dp_display_request_irq():

-       rc = devm_request_irq(dp_display->drm_dev->dev, dp->irq,
+       rc = devm_request_irq(&dp->pdev->dev, dp->irq,
 
> It fails actually, but I applied it manually here.

Please drop that series and give this one another spin.

Johan
Steev Klimaszewski Sept. 16, 2022, 4:43 p.m. UTC | #3
Hi Johan,

On Wed, Sep 14, 2022 at 1:01 AM Johan Hovold <johan@kernel.org> wrote:
>
> On Tue, Sep 13, 2022 at 03:23:10PM -0500, Steev Klimaszewski wrote:
> > Hi Johan,
> >
> > On 9/13/22 3:53 AM, Johan Hovold wrote:
> > > The MSM DRM driver is currently broken in multiple ways with respect to
> > > probe deferral. Not only does the driver currently fail to probe again
> > > after a late deferral, but due to a related use-after-free bug this also
> > > triggers NULL-pointer dereferences.
>
> > > In some cases, such has for the DP IRQ, the above situation can also be
> > > avoided by moving the allocation in question to the platform driver
> > > probe (d0) or component bind (between d2 and d3). But as doing so is not
> > > a general fix, this can be done later as a cleanup/optimisation.
>
> > I've tested this on both sc8180x (Lenovo Flex 5G) and sdm850 (Lenovo
> > Yoga C630), and both of them show the same issue:
>
> [ Copied the below from IRC instead as the formatting in your mail was
> off. ]
>
> > [    7.449305] platform ae9a000.displayport-controller: Fixing up cyclic dependency with ae01000.mdp
> > [    7.454344] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
> > [    7.454406] Mem abort info:
> > [    7.454423]   ESR = 0x0000000096000004
> > [    7.454446]   EC = 0x25: DABT (current EL), IL = 32 bits
> > [    7.454475]   SET = 0, FnV = 0
> > [    7.454494]   EA = 0, S1PTW = 0
> > [    7.454512]   FSC = 0x04: level 0 translation fault
> > [    7.454539] Data abort info:
> > [    7.454556]   ISV = 0, ISS = 0x00000004
> > [    7.454577]   CM = 0, WnR = 0
> > [    7.454595] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101504000
> > [    7.454629] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
> > [    7.454669] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> > [    7.454700] Modules linked in: i2c_hid_of i2c_hid leds_qcom_lpg led_class_multicolor rtc_pm8xxx msm mdt_loader gpu_sched drm_dp_aux_bus drm_display_helper drm_kms_helper drm phy_qcom_edp llcc_qcom i2c_qcom_geni phy_qcom_qmp_combo phy_qcom_snps_femto_v2 phy_qcom_qmp_ufs phy_qcom_qmp_pcie ufs_qcom pwm_bl
> > [    7.454860] CPU: 2 PID: 76 Comm: kworker/u16:2 Not tainted 5.19.0-rc8-next-20220728 #26
> > [    7.454902] Hardware name: LENOVO 82AK/LNVNB161216, BIOS EACN43WW(V1.15) 09/13/2021
> > [    7.454941] Workqueue: events_unbound deferred_probe_work_func
> > [    7.454982] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [    7.455020] pc : dp_display_request_irq+0x50/0xdc [msm]
> > [    7.455145] lr : dp_display_request_irq+0x2c/0xdc [msm]
> > [    7.455265] sp : ffff800008c1bb30
> > [    7.455285] x29: ffff800008c1bb30 x28: 0000000000000000 x27: 0000000000000000
> > [    7.455327] x26: ffffc9c918420000 x25: ffffc9c9186ec570 x24: 000000000000003a
> > [    7.455368] x23: ffffc9c918811d30 x22: ffff2a5806baa998 x21: ffff2a5806ba3410
> > [    7.455410] x20: ffff2a5806baa880 x19: ffff2a5806baa998 x18: ffffffffffffffff
> > [    7.455451] x17: 0000000000000038 x16: ffffc9c9164eeb24 x15: ffffffffffffffff
> > [    7.455492] x14: ffff2a5806bc3004 x13: ffff2a5806bc3000 x12: 0000000000000000
> > [    7.455533] x11: 0000000000000040 x10: ffffc9c918493080 x9 : ffffc9c918493078
> > [    7.455574] x8 : ffff2a5800681b88 x7 : 0000000000000000 x6 : ffff2a5806baa880
> > [    7.455616] x5 : ffffc9c8ca2de000 x4 : 0000000000080004 x3 : 0000000000000000
> > [    7.455656] x2 : ffffc9c8ca296000 x1 : 00000000000000a8 x0 : 0000000000000000
> > [    7.455698] Call trace:
> > [    7.455714]  dp_display_request_irq+0x50/0xdc [msm]
> > [    7.455834]  dp_display_probe+0xf8/0x4a4 [msm]
> > [    7.455950]  platform_probe+0x6c/0xc4
> > [    7.455976]  really_probe+0xbc/0x2d4
> > [    7.455999]  __driver_probe_device+0x78/0xe0
> > [    7.456025]  driver_probe_device+0x3c/0x13c
> > [    7.456051]  __device_attach_driver+0xb8/0x120
> > [    7.456077]  bus_for_each_drv+0x78/0xd0
> > [    7.456105]  __device_attach+0x9c/0x1a0
> > [    7.456129]  device_initial_probe+0x18/0x2c
> > [    7.456154]  bus_probe_device+0x9c/0xa4
> > [    7.456178]  deferred_probe_work_func+0x88/0xc0
> > [    7.456204]  process_one_work+0x1d4/0x330
> > [    7.456231]  worker_thread+0x70/0x42c
> > [    7.456255]  kthread+0x10c/0x110
> > [    7.456278]  ret_from_fork+0x10/0x20
> > [    7.456306] Code: aa1403e6 f2a00104 f0000225 f0ffffe2 (f9400400)
> > [    7.456341] ---[ end trace 0000000000000000 ]---
>
> > This is from the sc8180x, sdm850 is the same call stack, just with
> > different addresses.
> >
> > I do have
> > https://lore.kernel.org/all/20220712132258.671263-1-dmitry.baryshkov@linaro.org/
> > applied here which makes the 10th patch not apply cleanly.
>
> Yeah, that is expected. You need to drop Dmitry's series first. Once you
> verified that this series works, you can add it back if you want but you
> then need to restore the device pointer used when allocating the irq in
> dp_display_request_irq():
>
> -       rc = devm_request_irq(dp_display->drm_dev->dev, dp->irq,
> +       rc = devm_request_irq(&dp->pdev->dev, dp->irq,
>
> > It fails actually, but I applied it manually here.
>
> Please drop that series and give this one another spin.
>
> Johan

I thought as much but wasn't sure.  Thanks for the clarification.
With Dmitriy's patchset backed out, this series does work as expected.

Tested-by: Steev Klimaszewski <steev@kali.org>
Johan Hovold Oct. 21, 2022, 6:27 a.m. UTC | #4
On Tue, Sep 20, 2022 at 11:06:30AM +0200, Johan Hovold wrote:
> On Tue, Sep 13, 2022 at 10:53:10AM +0200, Johan Hovold wrote:
> > The MSM DRM driver is currently broken in multiple ways with respect to
> > probe deferral. Not only does the driver currently fail to probe again
> > after a late deferral, but due to a related use-after-free bug this also
> > triggers NULL-pointer dereferences.
> > 
> > These bugs are not new but have become critical with the release of
> > 5.19 where probe is deferred in case the aux-bus EP panel driver has not
> > yet been loaded.
> > 
> > The underlying problem is lifetime issues due to careless use of
> > device-managed resources.
> 
> Any chance of getting this merged for 6.1?

Is anyone picking these up as fixes for 6.1-rc as we discussed?

Johan
 
> > Changes in v2
> >  - use a custom devres action instead of amending the AUX bus interface
> >    (Doug)
> >  - split sanity check fixes and cleanups per bridge type (Dmitry)
> >  - add another Fixes tag for the missing bridge counter reset (Dmitry)
> > 
> > 
> > Johan Hovold (10):
> >   drm/msm: fix use-after-free on probe deferral
> >   drm/msm/dp: fix memory corruption with too many bridges
> >   drm/msm/dsi: fix memory corruption with too many bridges
> >   drm/msm/hdmi: fix memory corruption with too many bridges
> >   drm/msm/dp: fix IRQ lifetime
> >   drm/msm/dp: fix aux-bus EP lifetime
> >   drm/msm/dp: fix bridge lifetime
> >   drm/msm/hdmi: fix IRQ lifetime
> >   drm/msm/dp: drop modeset sanity checks
> >   drm/msm/dsi: drop modeset sanity checks
> > 
> >  drivers/gpu/drm/msm/dp/dp_display.c | 26 +++++++++++++++++++-------
> >  drivers/gpu/drm/msm/dp/dp_parser.c  |  6 +++---
> >  drivers/gpu/drm/msm/dp/dp_parser.h  |  5 +++--
> >  drivers/gpu/drm/msm/dsi/dsi.c       |  9 +++++----
> >  drivers/gpu/drm/msm/hdmi/hdmi.c     |  7 ++++++-
> >  drivers/gpu/drm/msm/msm_drv.c       |  1 +
> >  6 files changed, 37 insertions(+), 17 deletions(-)
Abhinav Kumar Oct. 21, 2022, 4:05 p.m. UTC | #5
Hi Johan

On 10/20/2022 11:27 PM, Johan Hovold wrote:
> On Tue, Sep 20, 2022 at 11:06:30AM +0200, Johan Hovold wrote:
>> On Tue, Sep 13, 2022 at 10:53:10AM +0200, Johan Hovold wrote:
>>> The MSM DRM driver is currently broken in multiple ways with respect to
>>> probe deferral. Not only does the driver currently fail to probe again
>>> after a late deferral, but due to a related use-after-free bug this also
>>> triggers NULL-pointer dereferences.
>>>
>>> These bugs are not new but have become critical with the release of
>>> 5.19 where probe is deferred in case the aux-bus EP panel driver has not
>>> yet been loaded.
>>>
>>> The underlying problem is lifetime issues due to careless use of
>>> device-managed resources.
>>
>> Any chance of getting this merged for 6.1?
> 
> Is anyone picking these up as fixes for 6.1-rc as we discussed?
> 
> Johan

All of these except the last two ( as discussed ) have landed in the 
-fixes tree

https://gitlab.freedesktop.org/drm/msm/-/commit/6808abdb33bf90330e70a687d29f038507e06ebb

Thanks

Abhinav

>   
>>> Changes in v2
>>>   - use a custom devres action instead of amending the AUX bus interface
>>>     (Doug)
>>>   - split sanity check fixes and cleanups per bridge type (Dmitry)
>>>   - add another Fixes tag for the missing bridge counter reset (Dmitry)
>>>
>>>
>>> Johan Hovold (10):
>>>    drm/msm: fix use-after-free on probe deferral
>>>    drm/msm/dp: fix memory corruption with too many bridges
>>>    drm/msm/dsi: fix memory corruption with too many bridges
>>>    drm/msm/hdmi: fix memory corruption with too many bridges
>>>    drm/msm/dp: fix IRQ lifetime
>>>    drm/msm/dp: fix aux-bus EP lifetime
>>>    drm/msm/dp: fix bridge lifetime
>>>    drm/msm/hdmi: fix IRQ lifetime
>>>    drm/msm/dp: drop modeset sanity checks
>>>    drm/msm/dsi: drop modeset sanity checks
>>>
>>>   drivers/gpu/drm/msm/dp/dp_display.c | 26 +++++++++++++++++++-------
>>>   drivers/gpu/drm/msm/dp/dp_parser.c  |  6 +++---
>>>   drivers/gpu/drm/msm/dp/dp_parser.h  |  5 +++--
>>>   drivers/gpu/drm/msm/dsi/dsi.c       |  9 +++++----
>>>   drivers/gpu/drm/msm/hdmi/hdmi.c     |  7 ++++++-
>>>   drivers/gpu/drm/msm/msm_drv.c       |  1 +
>>>   6 files changed, 37 insertions(+), 17 deletions(-)
Johan Hovold Oct. 24, 2022, 11:33 a.m. UTC | #6
On Fri, Oct 21, 2022 at 09:05:52AM -0700, Abhinav Kumar wrote:
> Hi Johan
> 
> On 10/20/2022 11:27 PM, Johan Hovold wrote:
> > On Tue, Sep 20, 2022 at 11:06:30AM +0200, Johan Hovold wrote:
> >> On Tue, Sep 13, 2022 at 10:53:10AM +0200, Johan Hovold wrote:
> >>> The MSM DRM driver is currently broken in multiple ways with respect to
> >>> probe deferral. Not only does the driver currently fail to probe again
> >>> after a late deferral, but due to a related use-after-free bug this also
> >>> triggers NULL-pointer dereferences.
> >>>
> >>> These bugs are not new but have become critical with the release of
> >>> 5.19 where probe is deferred in case the aux-bus EP panel driver has not
> >>> yet been loaded.
> >>>
> >>> The underlying problem is lifetime issues due to careless use of
> >>> device-managed resources.
> >>
> >> Any chance of getting this merged for 6.1?
> > 
> > Is anyone picking these up as fixes for 6.1-rc as we discussed?
> > 
> > Johan
> 
> All of these except the last two ( as discussed ) have landed in the 
> -fixes tree
> 
> https://gitlab.freedesktop.org/drm/msm/-/commit/6808abdb33bf90330e70a687d29f038507e06ebb

Ah, perfect, thanks.

When do you expect to send these on so that they end up in linux-next
and eventually Linus's tree?

Note that it looks like something happened with the commit messages when
you applied these. Specifically, the Fixes tags appears to now have a
line break in them and there's also some random whitespace before your
SoB:

	Fixes: c3bf8e21
	
	 ("drm/msm/dp: Add eDP support via aux_bus")
	Cc: stable@vger.kernel.org      # 5.19
	Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
	Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
	Tested-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
	Reviewed-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
	Patchwork: https://patchwork.freedesktop.org/patch/502667/
	Link: https://lore.kernel.org/r/20220913085320.8577-8-johan+linaro@kernel.org
	
	
	Signed-off-by: Abhinav Kumar's avatarAbhinav Kumar <quic_abhinavk@quicinc.com>

It's possible just the gitlab UI that's messed up, but perhaps you can
double check before they hit linux-next, which should complain about
this otherwise.

Johan
Rob Clark Oct. 24, 2022, 3:01 p.m. UTC | #7
On Mon, Oct 24, 2022 at 4:34 AM Johan Hovold <johan@kernel.org> wrote:
>
> On Fri, Oct 21, 2022 at 09:05:52AM -0700, Abhinav Kumar wrote:
> > Hi Johan
> >
> > On 10/20/2022 11:27 PM, Johan Hovold wrote:
> > > On Tue, Sep 20, 2022 at 11:06:30AM +0200, Johan Hovold wrote:
> > >> On Tue, Sep 13, 2022 at 10:53:10AM +0200, Johan Hovold wrote:
> > >>> The MSM DRM driver is currently broken in multiple ways with respect to
> > >>> probe deferral. Not only does the driver currently fail to probe again
> > >>> after a late deferral, but due to a related use-after-free bug this also
> > >>> triggers NULL-pointer dereferences.
> > >>>
> > >>> These bugs are not new but have become critical with the release of
> > >>> 5.19 where probe is deferred in case the aux-bus EP panel driver has not
> > >>> yet been loaded.
> > >>>
> > >>> The underlying problem is lifetime issues due to careless use of
> > >>> device-managed resources.
> > >>
> > >> Any chance of getting this merged for 6.1?
> > >
> > > Is anyone picking these up as fixes for 6.1-rc as we discussed?
> > >
> > > Johan
> >
> > All of these except the last two ( as discussed ) have landed in the
> > -fixes tree
> >
> > https://gitlab.freedesktop.org/drm/msm/-/commit/6808abdb33bf90330e70a687d29f038507e06ebb
>
> Ah, perfect, thanks.
>
> When do you expect to send these on so that they end up in linux-next
> and eventually Linus's tree?

I'll send a -fixes PR this week

> Note that it looks like something happened with the commit messages when
> you applied these. Specifically, the Fixes tags appears to now have a
> line break in them and there's also some random whitespace before your
> SoB:
>
>         Fixes: c3bf8e21
>
>          ("drm/msm/dp: Add eDP support via aux_bus")

naw, that is just some problem with gitlab's html generation, the
actual patch is fine ;-)

BR,
-R

>         Cc: stable@vger.kernel.org      # 5.19
>         Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
>         Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
>         Tested-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
>         Reviewed-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
>         Patchwork: https://patchwork.freedesktop.org/patch/502667/
>         Link: https://lore.kernel.org/r/20220913085320.8577-8-johan+linaro@kernel.org
>
>
>         Signed-off-by: Abhinav Kumar's avatarAbhinav Kumar <quic_abhinavk@quicinc.com>
>
> It's possible just the gitlab UI that's messed up, but perhaps you can
> double check before they hit linux-next, which should complain about
> this otherwise.
>
> Johan