diff mbox series

[5.10,637/717] drm/amd/display: Fix memory leaks in S3 resume

Message ID 20201228125051.444911072@linuxfoundation.org
State Superseded
Headers show
Series None | expand

Commit Message

Greg Kroah-Hartman Dec. 28, 2020, 12:50 p.m. UTC
From: Stylon Wang <stylon.wang@amd.com>

commit a135a1b4c4db1f3b8cbed9676a40ede39feb3362 upstream.

EDID parsing in S3 resume pushes new display modes
to probed_modes list but doesn't consolidate to actual
mode list. This creates a race condition when
amdgpu_dm_connector_ddc_get_modes() re-initializes the
list head without walking the list and results in  memory leak.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=209987
Acked-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Stylon Wang <stylon.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Andre Tomt Jan. 4, 2021, 7:04 p.m. UTC | #1
On 28.12.2020 13:50, Greg Kroah-Hartman wrote:
> From: Stylon Wang <stylon.wang@amd.com>

> 

> commit a135a1b4c4db1f3b8cbed9676a40ede39feb3362 upstream.

> 

> EDID parsing in S3 resume pushes new display modes

> to probed_modes list but doesn't consolidate to actual

> mode list. This creates a race condition when

> amdgpu_dm_connector_ddc_get_modes() re-initializes the

> list head without walking the list and results in  memory leak.


This commit is causing me problems on 5.10.4: when I turn off the 
display (a LG TV in this case), and turn it back on again later there is 
no video output and I get the following in the kernel log:

[ 8245.259628] [drm:dm_restore_drm_connector_state [amdgpu]] *ERROR* 
Restoring old state failed with -12

I've found another report on this commit as well:
https://bugzilla.kernel.org/show_bug.cgi?id=211033

And I suspect this is the same:
https://bugs.archlinux.org/task/69202

Reverting it from 5.10.4 makes things behave again.

Have not tested 5.4.86 or 5.11-rc.

I'm using a RX570 Polaris based card.
Oleksandr Natalenko Jan. 4, 2021, 8:10 p.m. UTC | #2
On Mon, Jan 04, 2021 at 08:04:08PM +0100, Andre Tomt wrote:
> On 28.12.2020 13:50, Greg Kroah-Hartman wrote:

> > From: Stylon Wang <stylon.wang@amd.com>

> > 

> > commit a135a1b4c4db1f3b8cbed9676a40ede39feb3362 upstream.

> > 

> > EDID parsing in S3 resume pushes new display modes

> > to probed_modes list but doesn't consolidate to actual

> > mode list. This creates a race condition when

> > amdgpu_dm_connector_ddc_get_modes() re-initializes the

> > list head without walking the list and results in  memory leak.

> 

> This commit is causing me problems on 5.10.4: when I turn off the display (a

> LG TV in this case), and turn it back on again later there is no video

> output and I get the following in the kernel log:

> 

> [ 8245.259628] [drm:dm_restore_drm_connector_state [amdgpu]] *ERROR*

> Restoring old state failed with -12


Uh, it seems you've just saved me a ton of gray hair. I have the very
same issue and I'm going to revert this patch now in order to check
whether it makes any difference.

Thanks!

> 

> I've found another report on this commit as well:

> https://bugzilla.kernel.org/show_bug.cgi?id=211033

> 

> And I suspect this is the same:

> https://bugs.archlinux.org/task/69202

> 

> Reverting it from 5.10.4 makes things behave again.

> 

> Have not tested 5.4.86 or 5.11-rc.

> 

> I'm using a RX570 Polaris based card.


-- 
  Oleksandr Natalenko (post-factum)
Oleksandr Natalenko Jan. 4, 2021, 8:40 p.m. UTC | #3
On Mon, Jan 04, 2021 at 09:10:17PM +0100, Oleksandr Natalenko wrote:
> On Mon, Jan 04, 2021 at 08:04:08PM +0100, Andre Tomt wrote:

> > On 28.12.2020 13:50, Greg Kroah-Hartman wrote:

> > > From: Stylon Wang <stylon.wang@amd.com>

> > > 

> > > commit a135a1b4c4db1f3b8cbed9676a40ede39feb3362 upstream.

> > > 

> > > EDID parsing in S3 resume pushes new display modes

> > > to probed_modes list but doesn't consolidate to actual

> > > mode list. This creates a race condition when

> > > amdgpu_dm_connector_ddc_get_modes() re-initializes the

> > > list head without walking the list and results in  memory leak.

> > 

> > This commit is causing me problems on 5.10.4: when I turn off the display (a

> > LG TV in this case), and turn it back on again later there is no video

> > output and I get the following in the kernel log:

> > 

> > [ 8245.259628] [drm:dm_restore_drm_connector_state [amdgpu]] *ERROR*

> > Restoring old state failed with -12

> 

> Uh, it seems you've just saved me a ton of gray hair. I have the very

> same issue and I'm going to revert this patch now in order to check

> whether it makes any difference.


Confirmed, reverting this patch makes my monitor light back after
turning off/on.

Also, during testing, I've noticed that with the stock v5.10.4 kernel
once reboot sequence is initiated and xorg gets killed, the monitor also
lights back and shows the console.

My HW:

0a:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Lexa PRO [Radeon 540/540X/550/550X / RX 540X/550/550X] (rev c7)

> 

> Thanks!

> 

> > 

> > I've found another report on this commit as well:

> > https://bugzilla.kernel.org/show_bug.cgi?id=211033

> > 

> > And I suspect this is the same:

> > https://bugs.archlinux.org/task/69202

> > 

> > Reverting it from 5.10.4 makes things behave again.

> > 

> > Have not tested 5.4.86 or 5.11-rc.

> > 

> > I'm using a RX570 Polaris based card.


-- 
  Oleksandr Natalenko (post-factum)
Greg Kroah-Hartman Jan. 5, 2021, 6:54 a.m. UTC | #4
On Mon, Jan 04, 2021 at 08:04:08PM +0100, Andre Tomt wrote:
> On 28.12.2020 13:50, Greg Kroah-Hartman wrote:

> > From: Stylon Wang <stylon.wang@amd.com>

> > 

> > commit a135a1b4c4db1f3b8cbed9676a40ede39feb3362 upstream.

> > 

> > EDID parsing in S3 resume pushes new display modes

> > to probed_modes list but doesn't consolidate to actual

> > mode list. This creates a race condition when

> > amdgpu_dm_connector_ddc_get_modes() re-initializes the

> > list head without walking the list and results in  memory leak.

> 

> This commit is causing me problems on 5.10.4: when I turn off the display (a

> LG TV in this case), and turn it back on again later there is no video

> output and I get the following in the kernel log:

> 

> [ 8245.259628] [drm:dm_restore_drm_connector_state [amdgpu]] *ERROR*

> Restoring old state failed with -12

> 

> I've found another report on this commit as well:

> https://bugzilla.kernel.org/show_bug.cgi?id=211033

> 

> And I suspect this is the same:

> https://bugs.archlinux.org/task/69202

> 

> Reverting it from 5.10.4 makes things behave again.

> 

> Have not tested 5.4.86 or 5.11-rc.

> 

> I'm using a RX570 Polaris based card.


Can you test 5.11-rc to see if this issue is there as well?

thanks,

greg k-h
Andre Tomt Jan. 5, 2021, 4:32 p.m. UTC | #5
On 05.01.2021 07:54, Greg Kroah-Hartman wrote:
> On Mon, Jan 04, 2021 at 08:04:08PM +0100, Andre Tomt wrote:

>> On 28.12.2020 13:50, Greg Kroah-Hartman wrote:

>>> From: Stylon Wang <stylon.wang@amd.com>

>>>

>>> commit a135a1b4c4db1f3b8cbed9676a40ede39feb3362 upstream.

>>>

>>> EDID parsing in S3 resume pushes new display modes

>>> to probed_modes list but doesn't consolidate to actual

>>> mode list. This creates a race condition when

>>> amdgpu_dm_connector_ddc_get_modes() re-initializes the

>>> list head without walking the list and results in  memory leak.

>>

>> This commit is causing me problems on 5.10.4: when I turn off the display (a

>> LG TV in this case), and turn it back on again later there is no video

>> output and I get the following in the kernel log:

>>

>> [ 8245.259628] [drm:dm_restore_drm_connector_state [amdgpu]] *ERROR*

>> Restoring old state failed with -12

>>

>> I've found another report on this commit as well:

>> https://bugzilla.kernel.org/show_bug.cgi?id=211033

>>

>> And I suspect this is the same:

>> https://bugs.archlinux.org/task/69202

>>

>> Reverting it from 5.10.4 makes things behave again.

>>

>> Have not tested 5.4.86 or 5.11-rc.

>>

>> I'm using a RX570 Polaris based card.

> 

> Can you test 5.11-rc to see if this issue is there as well?


Just did, and have the same issue on 5.11-rc2. Reverting it also solves 
the problem on 5.11-rc2, as it does on 5.10.4

FWIW one easy way to reproduce seems to be unplugging and re-plugging 
the HDMI.
Deucher, Alexander Jan. 5, 2021, 4:41 p.m. UTC | #6
[AMD Public Use]

> -----Original Message-----

> From: Andre Tomt <andre@tomt.net>

> Sent: Tuesday, January 5, 2021 11:32 AM

> To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

> Cc: linux-kernel@vger.kernel.org; stable@vger.kernel.org; Wentland, Harry

> <Harry.Wentland@amd.com>; Deucher, Alexander

> <Alexander.Deucher@amd.com>; Kazlauskas, Nicholas

> <Nicholas.Kazlauskas@amd.com>; Wang, Chao-kai (Stylon)

> <Stylon.Wang@amd.com>

> Subject: Re: [PATCH 5.10 637/717] drm/amd/display: Fix memory leaks in S3

> resume

> 

> On 05.01.2021 07:54, Greg Kroah-Hartman wrote:

> > On Mon, Jan 04, 2021 at 08:04:08PM +0100, Andre Tomt wrote:

> >> On 28.12.2020 13:50, Greg Kroah-Hartman wrote:

> >>> From: Stylon Wang <stylon.wang@amd.com>

> >>>

> >>> commit a135a1b4c4db1f3b8cbed9676a40ede39feb3362 upstream.

> >>>

> >>> EDID parsing in S3 resume pushes new display modes to probed_modes

> >>> list but doesn't consolidate to actual mode list. This creates a

> >>> race condition when

> >>> amdgpu_dm_connector_ddc_get_modes() re-initializes the list head

> >>> without walking the list and results in  memory leak.

> >>

> >> This commit is causing me problems on 5.10.4: when I turn off the

> >> display (a LG TV in this case), and turn it back on again later there

> >> is no video output and I get the following in the kernel log:

> >>

> >> [ 8245.259628] [drm:dm_restore_drm_connector_state [amdgpu]]

> *ERROR*

> >> Restoring old state failed with -12

> >>

> >> I've found another report on this commit as well:

> >>

> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbug

> >>

> zilla.kernel.org%2Fshow_bug.cgi%3Fid%3D211033&amp;data=04%7C01%7Cal

> ex

> >>

> ander.deucher%40amd.com%7Cad673e351bab4f6af94508d8b1977ed8%7C3d

> d8961f

> >>

> e4884e608e11a82d994e183d%7C0%7C0%7C637454612140560971%7CUnknow

> n%7CTWF

> >>

> pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXV

> CI6

> >>

> Mn0%3D%7C3000&amp;sdata=8Rsnbfh4P5GmFUlybb31mT7C0Ee4vDInxJ1gt

> C3jrVI%3

> >> D&amp;reserved=0

> >>

> >> And I suspect this is the same:

> >>

> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbug

> >>

> s.archlinux.org%2Ftask%2F69202&amp;data=04%7C01%7Calexander.deuche

> r%4

> >>

> 0amd.com%7Cad673e351bab4f6af94508d8b1977ed8%7C3dd8961fe4884e608

> e11a82

> >>

> d994e183d%7C0%7C0%7C637454612140560971%7CUnknown%7CTWFpbGZsb

> 3d8eyJWIj

> >>

> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3

> 000&a

> >>

> mp;sdata=1149tCcm3rcaj1MkVPbWqWhWFIPgkeBoYxo0oVv%2FzNI%3D&a

> mp;reserve

> >> d=0

> >>

> >> Reverting it from 5.10.4 makes things behave again.

> >>

> >> Have not tested 5.4.86 or 5.11-rc.

> >>

> >> I'm using a RX570 Polaris based card.

> >

> > Can you test 5.11-rc to see if this issue is there as well?

> 

> Just did, and have the same issue on 5.11-rc2. Reverting it also solves the

> problem on 5.11-rc2, as it does on 5.10.4

> 

> FWIW one easy way to reproduce seems to be unplugging and re-plugging

> the HDMI.


We are looking into the root cause, but I'll send out the revert for now.

Thanks,

Alex
diff mbox series

Patch

--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2278,7 +2278,8 @@  void amdgpu_dm_update_connector_after_de
 
 			drm_connector_update_edid_property(connector,
 							   aconnector->edid);
-			drm_add_edid_modes(connector, aconnector->edid);
+			aconnector->num_modes = drm_add_edid_modes(connector, aconnector->edid);
+			drm_connector_list_update(connector);
 
 			if (aconnector->dc_link->aux_mode)
 				drm_dp_cec_set_edid(&aconnector->dm_dp_aux.aux,