diff mbox series

[2/3] fbdev/simplefb: Cleanup fb_info in .fb_destroy rather than .remove

Message ID 20220504215722.56970-1-javierm@redhat.com
State Superseded
Headers show
Series fbdev: Fix use-after-free caused by wrong fb_info cleanup in drivers | expand

Commit Message

Javier Martinez Canillas May 4, 2022, 9:57 p.m. UTC
The driver is calling framebuffer_release() in its .remove callback, but
this will cause the struct fb_info to be freed too early. Since it could
be that a reference is still hold to it if user-space opened the fbdev.

This would lead to a use-after-free error if the framebuffer device was
unregistered but later a user-space process tries to close the fbdev fd.

The correct thing to do is to only unregister the framebuffer in the
driver's .remove callback, but do any cleanup in the fb_ops.fb_destroy.

Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
---

 drivers/video/fbdev/simplefb.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Thomas Zimmermann May 5, 2022, 7:29 a.m. UTC | #1
Hi

Am 04.05.22 um 23:57 schrieb Javier Martinez Canillas:
> The driver is calling framebuffer_release() in its .remove callback, but
> this will cause the struct fb_info to be freed too early. Since it could
> be that a reference is still hold to it if user-space opened the fbdev.
> 
> This would lead to a use-after-free error if the framebuffer device was
> unregistered but later a user-space process tries to close the fbdev fd.
> 
> The correct thing to do is to only unregister the framebuffer in the
> driver's .remove callback, but do any cleanup in the fb_ops.fb_destroy.
> 
> Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>

Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>

Please see my question below.

> ---
> 
>   drivers/video/fbdev/simplefb.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/video/fbdev/simplefb.c b/drivers/video/fbdev/simplefb.c
> index 94fc9c6d0411..2c198561c338 100644
> --- a/drivers/video/fbdev/simplefb.c
> +++ b/drivers/video/fbdev/simplefb.c
> @@ -84,6 +84,10 @@ struct simplefb_par {
>   static void simplefb_clocks_destroy(struct simplefb_par *par);
>   static void simplefb_regulators_destroy(struct simplefb_par *par);
>   
> +/*
> + * fb_ops.fb_destroy is called by the last put_fb_info() call at the end
> + * of unregister_framebuffer() or fb_release(). Do any cleanup here.
> + */
>   static void simplefb_destroy(struct fb_info *info)
>   {
>   	struct simplefb_par *par = info->par;
> @@ -94,6 +98,8 @@ static void simplefb_destroy(struct fb_info *info)
>   	if (info->screen_base)
>   		iounmap(info->screen_base);
>   
> +	framebuffer_release(info);
> +
>   	if (mem)
>   		release_mem_region(mem->start, resource_size(mem));

The original problem with fbdev hot-unplug was that vmwgfx needed the 
framebuffer region to be released. If we release it only after userspace 
closed it's final file descriptor, vmwgfx could have already failed.

I still don't fully get why this code apparently works or at least 
doesn't blow up occasionally. Any ideas?

Best regards
Thomas

>   }
> @@ -545,8 +551,8 @@ static int simplefb_remove(struct platform_device *pdev)
>   {
>   	struct fb_info *info = platform_get_drvdata(pdev);
>   
> +	/* simplefb_destroy takes care of info cleanup */
>   	unregister_framebuffer(info);
> -	framebuffer_release(info);
>   
>   	return 0;
>   }
Thomas Zimmermann May 5, 2022, 8:05 a.m. UTC | #2
Hi

Am 05.05.22 um 09:38 schrieb Javier Martinez Canillas:
> Hello Thomas,
> 
> On 5/5/22 09:29, Thomas Zimmermann wrote:
> 
> [snip]
> 
>>>    static void simplefb_destroy(struct fb_info *info)
>>>    {
>>>    	struct simplefb_par *par = info->par;
>>> @@ -94,6 +98,8 @@ static void simplefb_destroy(struct fb_info *info)
>>>    	if (info->screen_base)
>>>    		iounmap(info->screen_base);
>>>    
>>> +	framebuffer_release(info);
>>> +
>>>    	if (mem)
>>>    		release_mem_region(mem->start, resource_size(mem));
>>
>> The original problem with fbdev hot-unplug was that vmwgfx needed the
>> framebuffer region to be released. If we release it only after userspace
>> closed it's final file descriptor, vmwgfx could have already failed.
>>
>> I still don't fully get why this code apparently works or at least
>> doesn't blow up occasionally. Any ideas?
>>
> 
> I believe that vmwgfx doesn't fail to probe (or any other DRM driver)
> only when there are not user-space processes with a fbdev node opened
> since otherwise as you said the memory wouldn't be released yet.
> 
> unregister_framebuffer() is called from the driver's .remove handler
> and that decrement the fb_info refcount, so if reaches zero it will
> call to the fb fops .destroy() handler and release the I/O memory.
> 
> In other words, in most cases (i.e: only fbcon bound to the fbdev)
> the driver's removal/ device unbind and the memory release will be
> at the same time.
> 

We're one the same page here, but it's still sort of a mystery to me why 
this works in practice.

I'm specifically talking about pci_request_regions() in vmwgfx [1]. IIRC 
this would fail if simplefb still owns the framebuffer region. Lots of 
systems run Plymouth during boot and this should result in failures 
occasionally. Still, we never heard about anything.

Of course, it's always been broken (even long before real fbdev 
hotunplugging). Switching to simpledrm resolves the problem.

Best regards
Thomas

[1] 
https://elixir.bootlin.com/linux/v5.17.5/source/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c#L727
Javier Martinez Canillas May 5, 2022, 8:28 a.m. UTC | #3
Hello Thomas,

On 5/5/22 10:05, Thomas Zimmermann wrote:

[snip]

>>
>> In other words, in most cases (i.e: only fbcon bound to the fbdev)
>> the driver's removal/ device unbind and the memory release will be
>> at the same time.
>>
> 
> We're one the same page here, but it's still sort of a mystery to me why 
> this works in practice.
> 
> I'm specifically talking about pci_request_regions() in vmwgfx [1]. IIRC 
> this would fail if simplefb still owns the framebuffer region. Lots of 
> systems run Plymouth during boot and this should result in failures 
> occasionally. Still, we never heard about anything.
>

Yes, I think is because Plymouth IIUC waits for a /dev/dri/card? to be
present and only uses a /dev/fb? as a fallback if a timeout expires.

At least in Fedora (even before the efifb -> simpledrm change) it will
use KMS/DRM since the DRM kernel module for the graphics device in the
machine would be in the intird.

So efifb was only used for fbcon and plymouth would only use DRM/KMS
and not its fbdev backend.

This seems to be sort of a corner case when you have {efi,simple}fb
in the early boot but the real DRM module only in the rootfs after the
initrd has done a pivot_root(2).
 
> Of course, it's always been broken (even long before real fbdev 
> hotunplugging). Switching to simpledrm resolves the problem.
>

Indeed. My opinion after dealing with these fbdev problems is that we
shouldn't try to fix all possible corner cases and just try to get rid
of fbdev as soon as possible.
 -- 
Best regards,

Javier Martinez Canillas
Linux Engineering
Red Hat
Thomas Zimmermann May 5, 2022, 8:49 a.m. UTC | #4
Hi

Am 05.05.22 um 10:28 schrieb Javier Martinez Canillas:
> Hello Thomas,
> 
> On 5/5/22 10:05, Thomas Zimmermann wrote:
> 
> [snip]
> 
>>>
>>> In other words, in most cases (i.e: only fbcon bound to the fbdev)
>>> the driver's removal/ device unbind and the memory release will be
>>> at the same time.
>>>
>>
>> We're one the same page here, but it's still sort of a mystery to me why
>> this works in practice.
>>
>> I'm specifically talking about pci_request_regions() in vmwgfx [1]. IIRC
>> this would fail if simplefb still owns the framebuffer region. Lots of
>> systems run Plymouth during boot and this should result in failures
>> occasionally. Still, we never heard about anything.
>>
> 
> Yes, I think is because Plymouth IIUC waits for a /dev/dri/card? to be
> present and only uses a /dev/fb? as a fallback if a timeout expires.

Oh, right! The infamous plymouth timeout. 'sleep(30)' is the swiss-army 
knife of concurrent programming. ;)

But I'm not blaming anyone. There are situations where nothing else 
helps. Plymouth really can't do anything else here. We've received 
reports for gfx-handover bugs when the timeout expired and plymouth uses 
the fbdev. The system got stuck then because of fbdev IIRC.

Best regards
Thomas

> 
> At least in Fedora (even before the efifb -> simpledrm change) it will
> use KMS/DRM since the DRM kernel module for the graphics device in the
> machine would be in the intird.
> 
> So efifb was only used for fbcon and plymouth would only use DRM/KMS
> and not its fbdev backend.
> 
> This seems to be sort of a corner case when you have {efi,simple}fb
> in the early boot but the real DRM module only in the rootfs after the
> initrd has done a pivot_root(2).
>   
>> Of course, it's always been broken (even long before real fbdev
>> hotunplugging). Switching to simpledrm resolves the problem.
>>
> 
> Indeed. My opinion after dealing with these fbdev problems is that we
> shouldn't try to fix all possible corner cases and just try to get rid
> of fbdev as soon as possible.
>   --
> Best regards,
> 
> Javier Martinez Canillas
> Linux Engineering
> Red Hat
>
Daniel Vetter May 5, 2022, 12:52 p.m. UTC | #5
On Wed, May 04, 2022 at 11:57:22PM +0200, Javier Martinez Canillas wrote:
> The driver is calling framebuffer_release() in its .remove callback, but
> this will cause the struct fb_info to be freed too early. Since it could
> be that a reference is still hold to it if user-space opened the fbdev.
> 
> This would lead to a use-after-free error if the framebuffer device was
> unregistered but later a user-space process tries to close the fbdev fd.
> 
> The correct thing to do is to only unregister the framebuffer in the
> driver's .remove callback, but do any cleanup in the fb_ops.fb_destroy.
> 
> Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>

I think this should have a Fixes: line for the patch from Thomas which
changed the remove_conflicting_fb code:

27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal")

I think we should also mention that strictly speaking the code flow is now
wrong, because hw cleanup (like iounmap) should be done from ->remove
while sw cleanup (like calling framebuffer_release()) is the only thing
that should be done from ->fb_destroy. But the current code matches what
was happening before 27599aacbaef so more minimal "fix"

With those details added Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Same for the next patch.
-Daniel

> ---
> 
>  drivers/video/fbdev/simplefb.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/video/fbdev/simplefb.c b/drivers/video/fbdev/simplefb.c
> index 94fc9c6d0411..2c198561c338 100644
> --- a/drivers/video/fbdev/simplefb.c
> +++ b/drivers/video/fbdev/simplefb.c
> @@ -84,6 +84,10 @@ struct simplefb_par {
>  static void simplefb_clocks_destroy(struct simplefb_par *par);
>  static void simplefb_regulators_destroy(struct simplefb_par *par);
>  
> +/*
> + * fb_ops.fb_destroy is called by the last put_fb_info() call at the end
> + * of unregister_framebuffer() or fb_release(). Do any cleanup here.
> + */
>  static void simplefb_destroy(struct fb_info *info)
>  {
>  	struct simplefb_par *par = info->par;
> @@ -94,6 +98,8 @@ static void simplefb_destroy(struct fb_info *info)
>  	if (info->screen_base)
>  		iounmap(info->screen_base);
>  
> +	framebuffer_release(info);
> +
>  	if (mem)
>  		release_mem_region(mem->start, resource_size(mem));
>  }
> @@ -545,8 +551,8 @@ static int simplefb_remove(struct platform_device *pdev)
>  {
>  	struct fb_info *info = platform_get_drvdata(pdev);
>  
> +	/* simplefb_destroy takes care of info cleanup */
>  	unregister_framebuffer(info);
> -	framebuffer_release(info);
>  
>  	return 0;
>  }
> -- 
> 2.35.1
>
Javier Martinez Canillas May 5, 2022, 12:57 p.m. UTC | #6
Hello Daniel,

On 5/5/22 14:52, Daniel Vetter wrote:
> On Wed, May 04, 2022 at 11:57:22PM +0200, Javier Martinez Canillas wrote:
>> The driver is calling framebuffer_release() in its .remove callback, but
>> this will cause the struct fb_info to be freed too early. Since it could
>> be that a reference is still hold to it if user-space opened the fbdev.
>>
>> This would lead to a use-after-free error if the framebuffer device was
>> unregistered but later a user-space process tries to close the fbdev fd.
>>
>> The correct thing to do is to only unregister the framebuffer in the
>> driver's .remove callback, but do any cleanup in the fb_ops.fb_destroy.
>>
>> Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
> 
> I think this should have a Fixes: line for the patch from Thomas which
> changed the remove_conflicting_fb code:
> 
> 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal")
>

Ok.
 
> I think we should also mention that strictly speaking the code flow is now
> wrong, because hw cleanup (like iounmap) should be done from ->remove
> while sw cleanup (like calling framebuffer_release()) is the only thing
> that should be done from ->fb_destroy. But the current code matches what
> was happening before 27599aacbaef so more minimal "fix"
>

Yes, I tried to make it as small as possible. Thomas pointed out that vesafb
has the same issue and I included in v2. I had move things around more there
though.
 
> With those details added Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> Same for the next patch.

Thanks. I'll post a v3 adding what you suggested but probably not today.
diff mbox series

Patch

diff --git a/drivers/video/fbdev/simplefb.c b/drivers/video/fbdev/simplefb.c
index 94fc9c6d0411..2c198561c338 100644
--- a/drivers/video/fbdev/simplefb.c
+++ b/drivers/video/fbdev/simplefb.c
@@ -84,6 +84,10 @@  struct simplefb_par {
 static void simplefb_clocks_destroy(struct simplefb_par *par);
 static void simplefb_regulators_destroy(struct simplefb_par *par);
 
+/*
+ * fb_ops.fb_destroy is called by the last put_fb_info() call at the end
+ * of unregister_framebuffer() or fb_release(). Do any cleanup here.
+ */
 static void simplefb_destroy(struct fb_info *info)
 {
 	struct simplefb_par *par = info->par;
@@ -94,6 +98,8 @@  static void simplefb_destroy(struct fb_info *info)
 	if (info->screen_base)
 		iounmap(info->screen_base);
 
+	framebuffer_release(info);
+
 	if (mem)
 		release_mem_region(mem->start, resource_size(mem));
 }
@@ -545,8 +551,8 @@  static int simplefb_remove(struct platform_device *pdev)
 {
 	struct fb_info *info = platform_get_drvdata(pdev);
 
+	/* simplefb_destroy takes care of info cleanup */
 	unregister_framebuffer(info);
-	framebuffer_release(info);
 
 	return 0;
 }