diff mbox series

drm/amd: Fix random crashes due to bad kfree

Message ID Z2yQvTyg_MWwrlj3@debian.local
State New
Headers show
Series drm/amd: Fix random crashes due to bad kfree | expand

Commit Message

Chris Bainbridge Dec. 25, 2024, 11:09 p.m. UTC
Commit c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if
available for eDP") added function dm_helpers_probe_acpi_edid, which
fetches the EDID from the BIOS by calling acpi_video_get_edid.
acpi_video_get_edid returns a pointer to the EDID, but this pointer does
not originate from kmalloc - it is actually the internal "pointer" field
from an acpi_buffer struct (which did come from kmalloc).
dm_helpers_probe_acpi_edid then attempts to kfree the EDID pointer,
resulting in memory corruption which leads to random, intermittent
crashes (e.g. 4% of boots will fail with some Oops).

Fix this by allocating a new array (which can be safely freed) for the
EDID data in acpi_video_get_edid, and correctly freeing the acpi_buffer.

The only other caller of acpi_video_get_edid is nouveau_acpi_edid:
remove the extraneous kmemdup here as the EDID data is now copied in
acpi_video_get_edid.

Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Fixes: c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if available for eDP")
---
 drivers/acpi/acpi_video.c              | 3 ++-
 drivers/gpu/drm/nouveau/nouveau_acpi.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

Comments

Tobias Jakobi Dec. 25, 2024, 11:19 p.m. UTC | #1
Hi Chris!

On 12/26/24 00:09, Chris Bainbridge wrote:

> Commit c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if
> available for eDP") added function dm_helpers_probe_acpi_edid, which
> fetches the EDID from the BIOS by calling acpi_video_get_edid.
> acpi_video_get_edid returns a pointer to the EDID, but this pointer does
> not originate from kmalloc - it is actually the internal "pointer" field
> from an acpi_buffer struct (which did come from kmalloc).
> dm_helpers_probe_acpi_edid then attempts to kfree the EDID pointer,
> resulting in memory corruption which leads to random, intermittent
> crashes (e.g. 4% of boots will fail with some Oops).
>
> Fix this by allocating a new array (which can be safely freed) for the
> EDID data in acpi_video_get_edid, and correctly freeing the acpi_buffer.

Hmm, maybe I'm missing something here. But shouldn't it suffice to just 
remove the kfree call in dm_helpers_probe_acpi_edid()?

With best wishes,
Tobias

>
> The only other caller of acpi_video_get_edid is nouveau_acpi_edid:
> remove the extraneous kmemdup here as the EDID data is now copied in
> acpi_video_get_edid.
>
> Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com>
> Fixes: c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if available for eDP")
> ---
>   drivers/acpi/acpi_video.c              | 3 ++-
>   drivers/gpu/drm/nouveau/nouveau_acpi.c | 2 +-
>   2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/acpi/acpi_video.c b/drivers/acpi/acpi_video.c
> index 8274a17872ed..151d1d534264 100644
> --- a/drivers/acpi/acpi_video.c
> +++ b/drivers/acpi/acpi_video.c
> @@ -1485,7 +1485,8 @@ int acpi_video_get_edid(struct acpi_device *device, int type, int device_id,
>   		if (!length)
>   			continue;
>   
> -		*edid = buffer->buffer.pointer;
> +		*edid = kmemdup(buffer->buffer.pointer, length, GFP_KERNEL);
> +		kfree(buffer);
>   		return length;
>   	}
>   
> diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.c b/drivers/gpu/drm/nouveau/nouveau_acpi.c
> index 8f0c69aad248..21b56cc7605c 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_acpi.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_acpi.c
> @@ -384,7 +384,7 @@ nouveau_acpi_edid(struct drm_device *dev, struct drm_connector *connector)
>   	if (ret < 0)
>   		return NULL;
>   
> -	return kmemdup(edid, EDID_LENGTH, GFP_KERNEL);
> +	return edid;
>   }
>   
>   bool nouveau_acpi_video_backlight_use_native(void)
Chris Bainbridge Dec. 26, 2024, 1:27 a.m. UTC | #2
On Thu, Dec 26, 2024 at 12:19:02AM +0100, Tobias Jakobi wrote:
> Hi Chris!
> 
> On 12/26/24 00:09, Chris Bainbridge wrote:
> 
> > Commit c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if
> > available for eDP") added function dm_helpers_probe_acpi_edid, which
> > fetches the EDID from the BIOS by calling acpi_video_get_edid.
> > acpi_video_get_edid returns a pointer to the EDID, but this pointer does
> > not originate from kmalloc - it is actually the internal "pointer" field
> > from an acpi_buffer struct (which did come from kmalloc).
> > dm_helpers_probe_acpi_edid then attempts to kfree the EDID pointer,
> > resulting in memory corruption which leads to random, intermittent
> > crashes (e.g. 4% of boots will fail with some Oops).
> > 
> > Fix this by allocating a new array (which can be safely freed) for the
> > EDID data in acpi_video_get_edid, and correctly freeing the acpi_buffer.
> 
> Hmm, maybe I'm missing something here. But shouldn't it suffice to just
> remove the kfree call in dm_helpers_probe_acpi_edid()?

Yes, that would work to fix the bad kfree, but there would be a small
memory leak of the acpi_buffer struct. It's not a huge problem since
this code is rarely run, and the Nouveau code has never tried to free
the edid buffer and apparently nobody noticed, but it would be better to
do the correct thing.

One other curiosity is this comment in the code that allocates the
memory:

case ACPI_ALLOCATE_BUFFER:
	/*
	 * Allocate a new buffer. We directectly call acpi_os_allocate here to
	 * purposefully bypass the (optionally enabled) internal allocation
	 * tracking mechanism since we only want to track internal
	 * allocations. Note: The caller should use acpi_os_free to free this
	 * buffer created via ACPI_ALLOCATE_BUFFER.
	 */

Which makes me wonder if all the calls to kfree on acpi_buffer structs
with ACPI_ALLOCATE_BUFFER in acpi_video.c should actually be calls to
acpi_os_free instead? I used kfree just for consistency with the
existing code.
Tobias Jakobi Dec. 26, 2024, 12:29 p.m. UTC | #3
On 12/26/24 02:27, Chris Bainbridge wrote:

> On Thu, Dec 26, 2024 at 12:19:02AM +0100, Tobias Jakobi wrote:
>> Hi Chris!
>>
>> On 12/26/24 00:09, Chris Bainbridge wrote:
>>
>>> Commit c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if
>>> available for eDP") added function dm_helpers_probe_acpi_edid, which
>>> fetches the EDID from the BIOS by calling acpi_video_get_edid.
>>> acpi_video_get_edid returns a pointer to the EDID, but this pointer does
>>> not originate from kmalloc - it is actually the internal "pointer" field
>>> from an acpi_buffer struct (which did come from kmalloc).
>>> dm_helpers_probe_acpi_edid then attempts to kfree the EDID pointer,
>>> resulting in memory corruption which leads to random, intermittent
>>> crashes (e.g. 4% of boots will fail with some Oops).
>>>
>>> Fix this by allocating a new array (which can be safely freed) for the
>>> EDID data in acpi_video_get_edid, and correctly freeing the acpi_buffer.
>> Hmm, maybe I'm missing something here. But shouldn't it suffice to just
>> remove the kfree call in dm_helpers_probe_acpi_edid()?
> Yes, that would work to fix the bad kfree, but there would be a small
> memory leak of the acpi_buffer struct. It's not a huge problem since
> this code is rarely run, and the Nouveau code has never tried to free
> the edid buffer and apparently nobody noticed, but it would be better to
> do the correct thing.

OK, thanks for explaining. I didn't immediately understand that 
something was leaking memory. Only that we were freeing something that 
we are not supposed to free.

> One other curiosity is this comment in the code that allocates the
> memory:
>
> case ACPI_ALLOCATE_BUFFER:
> 	/*
> 	 * Allocate a new buffer. We directectly call acpi_os_allocate here to
> 	 * purposefully bypass the (optionally enabled) internal allocation
> 	 * tracking mechanism since we only want to track internal
> 	 * allocations. Note: The caller should use acpi_os_free to free this
> 	 * buffer created via ACPI_ALLOCATE_BUFFER.
> 	 */
>
> Which makes me wonder if all the calls to kfree on acpi_buffer structs
> with ACPI_ALLOCATE_BUFFER in acpi_video.c should actually be calls to
> acpi_os_free instead? I used kfree just for consistency with the
> existing code.

Wouldn't it make more sense to do the memdup handling in 
acpi_video_device_EDID()? This way you have both alloc and free in the 
same function. But I'm no expert when it comes to the ACPI kernel code. 
Just my two cents :-D

With best wishes,
Tobias
Hans de Goede Jan. 13, 2025, 9:25 a.m. UTC | #4
Hi,

On 11-Jan-25 7:59 PM, Chris Bainbridge wrote:
> Commit c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if
> available for eDP") added function dm_helpers_probe_acpi_edid, which
> fetches the EDID from the BIOS by calling acpi_video_get_edid.
> acpi_video_get_edid returns a pointer to the EDID, but this pointer does
> not originate from kmalloc - it is actually the internal "pointer" field
> from an acpi_buffer struct (which did come from kmalloc).
> dm_helpers_probe_acpi_edid then attempts to kfree the EDID pointer,
> resulting in memory corruption which leads to random, intermittent
> crashes (e.g. 4% of boots will fail with some Oops).
> 
> Fix this by allocating a new array (which can be safely freed) for the
> EDID data, and correctly freeing the acpi_buffer pointer.
> 
> The only other caller of acpi_video_get_edid is nouveau_acpi_edid:
> remove the extraneous kmemdup here as the EDID data is now copied in
> acpi_video_device_EDID.
> 
> Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com>
> Fixes: c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if available for eDP")
> ---> Changes in v2:
> 	- check kmemdup() return value
> 	- move buffer management into acpi_video_device_EDID()
> 	- return actual length value of buffer

Thanks, patch looks good to me:

Reviewed-by: Hans de Goede <hdegoede@redhat.com>

Regards,

Hans



> ---
>  drivers/acpi/acpi_video.c              | 50 ++++++++++++++------------
>  drivers/gpu/drm/nouveau/nouveau_acpi.c |  2 +-
>  2 files changed, 29 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_video.c b/drivers/acpi/acpi_video.c
> index 8274a17872ed..3c627bdf2d1b 100644
> --- a/drivers/acpi/acpi_video.c
> +++ b/drivers/acpi/acpi_video.c
> @@ -610,16 +610,29 @@ acpi_video_device_lcd_get_level_current(struct acpi_video_device *device,
>  	return 0;
>  }
>  
> +/*
> + *  Arg:
> + *	device	: video output device (LCD, CRT, ..)
> + *	edid    : address for returned EDID pointer
> + *	length  : _DDC length to request (must be a multiple of 128)
> + *
> + *  Return Value:
> + *	Length of EDID (positive value) or error (negative value)
> + *
> + *  Get EDID from ACPI _DDC. On success, a pointer to the EDID data is written
> + *  to the edid address, and the length of the EDID is returned. The caller is
> + *  responsible for freeing the edid pointer.
> + */
> +
>  static int
> -acpi_video_device_EDID(struct acpi_video_device *device,
> -		       union acpi_object **edid, int length)
> +acpi_video_device_EDID(struct acpi_video_device *device, void **edid, int length)
>  {
> -	int status;
> +	acpi_status status;
>  	struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
>  	union acpi_object *obj;
>  	union acpi_object arg0 = { ACPI_TYPE_INTEGER };
>  	struct acpi_object_list args = { 1, &arg0 };
> -
> +	int ret;
>  
>  	*edid = NULL;
>  
> @@ -636,16 +649,17 @@ acpi_video_device_EDID(struct acpi_video_device *device,
>  
>  	obj = buffer.pointer;
>  
> -	if (obj && obj->type == ACPI_TYPE_BUFFER)
> -		*edid = obj;
> -	else {
> +	if (obj && obj->type == ACPI_TYPE_BUFFER) {
> +		*edid = kmemdup(obj->buffer.pointer, obj->buffer.length, GFP_KERNEL);
> +		ret = *edid ? obj->buffer.length : -ENOMEM;
> +	} else {
>  		acpi_handle_debug(device->dev->handle,
>  				 "Invalid _DDC data for length %d\n", length);
> -		status = -EFAULT;
> -		kfree(obj);
> +		ret = -EFAULT;
>  	}
>  
> -	return status;
> +	kfree(obj);
> +	return ret;
>  }
>  
>  /* bus */
> @@ -1435,9 +1449,7 @@ int acpi_video_get_edid(struct acpi_device *device, int type, int device_id,
>  {
>  	struct acpi_video_bus *video;
>  	struct acpi_video_device *video_device;
> -	union acpi_object *buffer = NULL;
> -	acpi_status status;
> -	int i, length;
> +	int i, length, ret;
>  
>  	if (!device || !acpi_driver_data(device))
>  		return -EINVAL;
> @@ -1477,16 +1489,10 @@ int acpi_video_get_edid(struct acpi_device *device, int type, int device_id,
>  		}
>  
>  		for (length = 512; length > 0; length -= 128) {
> -			status = acpi_video_device_EDID(video_device, &buffer,
> -							length);
> -			if (ACPI_SUCCESS(status))
> -				break;
> +			ret = acpi_video_device_EDID(video_device, edid, length);
> +			if (ret > 0)
> +				return ret;
>  		}
> -		if (!length)
> -			continue;
> -
> -		*edid = buffer->buffer.pointer;
> -		return length;
>  	}
>  
>  	return -ENODEV;
> diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.c b/drivers/gpu/drm/nouveau/nouveau_acpi.c
> index 8f0c69aad248..21b56cc7605c 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_acpi.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_acpi.c
> @@ -384,7 +384,7 @@ nouveau_acpi_edid(struct drm_device *dev, struct drm_connector *connector)
>  	if (ret < 0)
>  		return NULL;
>  
> -	return kmemdup(edid, EDID_LENGTH, GFP_KERNEL);
> +	return edid;
>  }
>  
>  bool nouveau_acpi_video_backlight_use_native(void)
Mario Limonciello Jan. 13, 2025, 3:59 p.m. UTC | #5
On 1/13/2025 08:19, Mario Limonciello wrote:
> On 1/11/2025 12:59, Chris Bainbridge wrote:
>> Commit c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if
>> available for eDP") added function dm_helpers_probe_acpi_edid, which
>> fetches the EDID from the BIOS by calling acpi_video_get_edid.
>> acpi_video_get_edid returns a pointer to the EDID, but this pointer does
>> not originate from kmalloc - it is actually the internal "pointer" field
>> from an acpi_buffer struct (which did come from kmalloc).
>> dm_helpers_probe_acpi_edid then attempts to kfree the EDID pointer,
>> resulting in memory corruption which leads to random, intermittent
>> crashes (e.g. 4% of boots will fail with some Oops).
>>
>> Fix this by allocating a new array (which can be safely freed) for the
>> EDID data, and correctly freeing the acpi_buffer pointer.
>>
>> The only other caller of acpi_video_get_edid is nouveau_acpi_edid:
>> remove the extraneous kmemdup here as the EDID data is now copied in
>> acpi_video_device_EDID.
>>
>> Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com>
>> Fixes: c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if 
>> available for eDP")
> 
> Two minor documentation related comments to consider, otherwise I think 
> the code change looks good.  Feel free to include:
> 
> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>

A few more tags to collate from another thread:

Reported-by: Borislav Petkov (AMD) <bp@alien8.de>
Closes: 
https://lore.kernel.org/amd-gfx/20250110175252.GBZ4FedNKqmBRaY4T3@fat_crate.local/T/#m324a23eb4c4c32fa7e89e31f8ba96c781e496fb1
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>

> 
>> ---
>> Changes in v2:
>>     - check kmemdup() return value
>>     - move buffer management into acpi_video_device_EDID()
>>     - return actual length value of buffer
>> ---
>>   drivers/acpi/acpi_video.c              | 50 ++++++++++++++------------
>>   drivers/gpu/drm/nouveau/nouveau_acpi.c |  2 +-
>>   2 files changed, 29 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/acpi/acpi_video.c b/drivers/acpi/acpi_video.c
>> index 8274a17872ed..3c627bdf2d1b 100644
>> --- a/drivers/acpi/acpi_video.c
>> +++ b/drivers/acpi/acpi_video.c
>> @@ -610,16 +610,29 @@ acpi_video_device_lcd_get_level_current(struct 
>> acpi_video_device *device,
>>       return 0;
>>   }
>> +/*
>> + *  Arg:
> 
> As you've pretty much written kernel doc, us it better to just make this 
> proper kerneldoc (IE use /**)?
> 
>> + *    device    : video output device (LCD, CRT, ..)
>> + *    edid    : address for returned EDID pointer
>> + *    length  : _DDC length to request (must be a multiple of 128)
>> + *
>> + *  Return Value:
>> + *    Length of EDID (positive value) or error (negative value)
>> + *
>> + *  Get EDID from ACPI _DDC. On success, a pointer to the EDID data 
>> is written
>> + *  to the edid address, and the length of the EDID is returned. The 
>> caller is
> 
> Since 'EDID' and 'edid' mean different things in the context of this 
> description for the purpose of clarity I think it would be better to say 
> "the edid pointer address".
> 
>> + *  responsible for freeing the edid pointer.
>> + */
>> +
>>   static int
>> -acpi_video_device_EDID(struct acpi_video_device *device,
>> -               union acpi_object **edid, int length)
>> +acpi_video_device_EDID(struct acpi_video_device *device, void **edid, 
>> int length)
>>   {
>> -    int status;
>> +    acpi_status status;
>>       struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
>>       union acpi_object *obj;
>>       union acpi_object arg0 = { ACPI_TYPE_INTEGER };
>>       struct acpi_object_list args = { 1, &arg0 };
>> -
>> +    int ret;
>>       *edid = NULL;
>> @@ -636,16 +649,17 @@ acpi_video_device_EDID(struct acpi_video_device 
>> *device,
>>       obj = buffer.pointer;
>> -    if (obj && obj->type == ACPI_TYPE_BUFFER)
>> -        *edid = obj;
>> -    else {
>> +    if (obj && obj->type == ACPI_TYPE_BUFFER) {
>> +        *edid = kmemdup(obj->buffer.pointer, obj->buffer.length, 
>> GFP_KERNEL);
>> +        ret = *edid ? obj->buffer.length : -ENOMEM;
>> +    } else {
>>           acpi_handle_debug(device->dev->handle,
>>                    "Invalid _DDC data for length %d\n", length);
>> -        status = -EFAULT;
>> -        kfree(obj);
>> +        ret = -EFAULT;
>>       }
>> -    return status;
>> +    kfree(obj);
>> +    return ret;
>>   }
>>   /* bus */
>> @@ -1435,9 +1449,7 @@ int acpi_video_get_edid(struct acpi_device 
>> *device, int type, int device_id,
>>   {
>>       struct acpi_video_bus *video;
>>       struct acpi_video_device *video_device;
>> -    union acpi_object *buffer = NULL;
>> -    acpi_status status;
>> -    int i, length;
>> +    int i, length, ret;
>>       if (!device || !acpi_driver_data(device))
>>           return -EINVAL;
>> @@ -1477,16 +1489,10 @@ int acpi_video_get_edid(struct acpi_device 
>> *device, int type, int device_id,
>>           }
>>           for (length = 512; length > 0; length -= 128) {
>> -            status = acpi_video_device_EDID(video_device, &buffer,
>> -                            length);
>> -            if (ACPI_SUCCESS(status))
>> -                break;
>> +            ret = acpi_video_device_EDID(video_device, edid, length);
>> +            if (ret > 0)
>> +                return ret;
>>           }
>> -        if (!length)
>> -            continue;
>> -
>> -        *edid = buffer->buffer.pointer;
>> -        return length;
>>       }
>>       return -ENODEV;
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.c b/drivers/gpu/drm/ 
>> nouveau/nouveau_acpi.c
>> index 8f0c69aad248..21b56cc7605c 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_acpi.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_acpi.c
>> @@ -384,7 +384,7 @@ nouveau_acpi_edid(struct drm_device *dev, struct 
>> drm_connector *connector)
>>       if (ret < 0)
>>           return NULL;
>> -    return kmemdup(edid, EDID_LENGTH, GFP_KERNEL);
>> +    return edid;
>>   }
>>   bool nouveau_acpi_video_backlight_use_native(void)
>
diff mbox series

Patch

diff --git a/drivers/acpi/acpi_video.c b/drivers/acpi/acpi_video.c
index 8274a17872ed..151d1d534264 100644
--- a/drivers/acpi/acpi_video.c
+++ b/drivers/acpi/acpi_video.c
@@ -1485,7 +1485,8 @@  int acpi_video_get_edid(struct acpi_device *device, int type, int device_id,
 		if (!length)
 			continue;
 
-		*edid = buffer->buffer.pointer;
+		*edid = kmemdup(buffer->buffer.pointer, length, GFP_KERNEL);
+		kfree(buffer);
 		return length;
 	}
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.c b/drivers/gpu/drm/nouveau/nouveau_acpi.c
index 8f0c69aad248..21b56cc7605c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_acpi.c
+++ b/drivers/gpu/drm/nouveau/nouveau_acpi.c
@@ -384,7 +384,7 @@  nouveau_acpi_edid(struct drm_device *dev, struct drm_connector *connector)
 	if (ret < 0)
 		return NULL;
 
-	return kmemdup(edid, EDID_LENGTH, GFP_KERNEL);
+	return edid;
 }
 
 bool nouveau_acpi_video_backlight_use_native(void)