diff mbox series

[v2,07/11] video/aperture: Disable and unregister sysfb devices via aperture helpers

Message ID 20220718072322.8927-8-tzimmermann@suse.de
State New
Headers show
Series fbdev: Maintain device ownership with aperture helpers | expand

Commit Message

Thomas Zimmermann July 18, 2022, 7:23 a.m. UTC
Call sysfb_disable() before removing conflicting devices in aperture
helpers. Fixes sysfb state if fbdev has been disabled.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Fixes: fb84efa28a48 ("drm/aperture: Run fbdev removal before internal helpers")
Cc: Zack Rusin <zackr@vmware.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Changcheng Deng <deng.changcheng@zte.com.cn>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: dri-devel@lists.freedesktop.org
---
 drivers/video/aperture.c         | 14 ++++++++++++++
 drivers/video/fbdev/core/fbmem.c | 12 ------------
 2 files changed, 14 insertions(+), 12 deletions(-)

Comments

Samuel Čavoj March 20, 2023, 1:47 a.m. UTC | #1
Hi,

> diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
> index f42a0d8bc211..101e13c2cf41 100644
> --- a/drivers/video/aperture.c
> +++ b/drivers/video/aperture.c
> @@ -8,6 +8,7 @@
>  #include <linux/pci.h>
>  #include <linux/platform_device.h>
>  #include <linux/slab.h>
> +#include <linux/sysfb.h>
>  #include <linux/types.h>
>  #include <linux/vgaarb.h>
> 
> @@ -286,7 +287,20 @@ int 
> aperture_remove_conflicting_devices(resource_size_t base, 
> resource_size_t si
>  #if IS_REACHABLE(CONFIG_FB)
>  	struct apertures_struct *a;
>  	int ret;
> +#endif
> +
> +	/*
> +	 * If a driver asked to unregister a platform device registered by
> +	 * sysfb, then can be assumed that this is a driver for a display
> +	 * that is set up by the system firmware and has a generic driver.
> +	 *
> +	 * Drivers for devices that don't have a generic driver will never
> +	 * ask for this, so let's assume that a real driver for the display
> +	 * was already probed and prevent sysfb to register devices later.
> +	 */
> +	sysfb_disable();

This call to sysfb_disable() has been causing trouble with regard to
VFIO. VFIO has been calling aperture_remove_conflicting_pci_devices to
get rid of any console drivers (d173780620792c) using the device in
question, but now even unrelated drivers are getting killed. Example
situation:

Machine has two GPUs and uses efifb for the console. Efifb registers
with the aperture system the efi framebuffer region, which is covered
by a BAR resource of GPU 1. VFIO grabs GPU 2 and calls
aperture_remove_conflicting_pci_devices(GPU 2). GPU 2 has no overlap
with the efifb on GPU1 but the efifb is killed regardless due to
the unconditional call to sysfb_disable(). The console switches
to dummy and locks up from the user perspective.
This seems unnecessary, as the device is unrelated.

I do not quite understand the comment justifying the call.

Some discussions with workarounds:
https://old.reddit.com/r/VFIO/comments/11qei4t/framebuffer_doesnt_work_anymore_after_passthrough/
https://bbs.archlinux.org/viewtopic.php?id=280512


Thanks,
Samuel
Thomas Zimmermann March 20, 2023, 9:46 a.m. UTC | #2
Hi

Am 20.03.23 um 02:47 schrieb Samuel Čavoj:
> Hi,
> 
>> diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
>> index f42a0d8bc211..101e13c2cf41 100644
>> --- a/drivers/video/aperture.c
>> +++ b/drivers/video/aperture.c
>> @@ -8,6 +8,7 @@
>>  #include <linux/pci.h>
>>  #include <linux/platform_device.h>
>>  #include <linux/slab.h>
>> +#include <linux/sysfb.h>
>>  #include <linux/types.h>
>>  #include <linux/vgaarb.h>
>>
>> @@ -286,7 +287,20 @@ int 
>> aperture_remove_conflicting_devices(resource_size_t base, 
>> resource_size_t si
>>  #if IS_REACHABLE(CONFIG_FB)
>>      struct apertures_struct *a;
>>      int ret;
>> +#endif
>> +
>> +    /*
>> +     * If a driver asked to unregister a platform device registered by
>> +     * sysfb, then can be assumed that this is a driver for a display
>> +     * that is set up by the system firmware and has a generic driver.
>> +     *
>> +     * Drivers for devices that don't have a generic driver will never
>> +     * ask for this, so let's assume that a real driver for the display
>> +     * was already probed and prevent sysfb to register devices later.
>> +     */
>> +    sysfb_disable();
> 
> This call to sysfb_disable() has been causing trouble with regard to
> VFIO. VFIO has been calling aperture_remove_conflicting_pci_devices to
> get rid of any console drivers (d173780620792c) using the device in
> question, but now even unrelated drivers are getting killed. Example
> situation:

Which drivers do you use?

Best regards
Thomas

> 
> Machine has two GPUs and uses efifb for the console. Efifb registers
> with the aperture system the efi framebuffer region, which is covered
> by a BAR resource of GPU 1. VFIO grabs GPU 2 and calls
> aperture_remove_conflicting_pci_devices(GPU 2). GPU 2 has no overlap
> with the efifb on GPU1 but the efifb is killed regardless due to
> the unconditional call to sysfb_disable(). The console switches
> to dummy and locks up from the user perspective.
> This seems unnecessary, as the device is unrelated.
> 
> I do not quite understand the comment justifying the call.
> 
> Some discussions with workarounds:
> https://old.reddit.com/r/VFIO/comments/11qei4t/framebuffer_doesnt_work_anymore_after_passthrough/
> https://bbs.archlinux.org/viewtopic.php?id=280512
> 
> 
> Thanks,
> Samuel
Javier Martinez Canillas March 20, 2023, 10:13 a.m. UTC | #3
Thomas Zimmermann <tzimmermann@suse.de> writes:

[...]

>>> +    /*
>>> +     * If a driver asked to unregister a platform device registered by
>>> +     * sysfb, then can be assumed that this is a driver for a display
>>> +     * that is set up by the system firmware and has a generic driver.
>>> +     *
>>> +     * Drivers for devices that don't have a generic driver will never
>>> +     * ask for this, so let's assume that a real driver for the display
>>> +     * was already probed and prevent sysfb to register devices later.
>>> +     */
>>> +    sysfb_disable();
>> 
>> This call to sysfb_disable() has been causing trouble with regard to
>> VFIO. VFIO has been calling aperture_remove_conflicting_pci_devices to
>> get rid of any console drivers (d173780620792c) using the device in
>> question, but now even unrelated drivers are getting killed. Example
>> situation:
>
> Which drivers do you use?
>

Also, what kernel version?

[...]

>> 
>> Machine has two GPUs and uses efifb for the console. Efifb registers
>> with the aperture system the efi framebuffer region, which is covered
>> by a BAR resource of GPU 1. VFIO grabs GPU 2 and calls
>> aperture_remove_conflicting_pci_devices(GPU 2). GPU 2 has no overlap
>> with the efifb on GPU1 but the efifb is killed regardless due to
>> the unconditional call to sysfb_disable(). The console switches
>> to dummy and locks up from the user perspective.
>> This seems unnecessary, as the device is unrelated.
>> 

That's a bug indeed but I thought that was already fixed...
Samuel Čavoj March 20, 2023, 11:08 a.m. UTC | #4
On 2023-03-20 11:13, Javier Martinez Canillas wrote:
> Thomas Zimmermann <tzimmermann@suse.de> writes:
> 
> [...]
> 
>>>> +    /*
>>>> +     * If a driver asked to unregister a platform device registered 
>>>> by
>>>> +     * sysfb, then can be assumed that this is a driver for a 
>>>> display
>>>> +     * that is set up by the system firmware and has a generic 
>>>> driver.
>>>> +     *
>>>> +     * Drivers for devices that don't have a generic driver will 
>>>> never
>>>> +     * ask for this, so let's assume that a real driver for the 
>>>> display
>>>> +     * was already probed and prevent sysfb to register devices 
>>>> later.
>>>> +     */
>>>> +    sysfb_disable();
>>> 
>>> This call to sysfb_disable() has been causing trouble with regard to
>>> VFIO. VFIO has been calling aperture_remove_conflicting_pci_devices 
>>> to
>>> get rid of any console drivers (d173780620792c) using the device in
>>> question, but now even unrelated drivers are getting killed. Example
>>> situation:
>> 
>> Which drivers do you use?

This happens with either no drivers loaded or the proprietary nvidia
driver. Nouveau is fine as it doesn't rely on efifb but brings its own.

>> 
> 
> Also, what kernel version?

I tried with 6.2.6, can build mainline and test there as well.

Thanks for help!

> 
> [...]
> 
>>> 
>>> Machine has two GPUs and uses efifb for the console. Efifb registers
>>> with the aperture system the efi framebuffer region, which is covered
>>> by a BAR resource of GPU 1. VFIO grabs GPU 2 and calls
>>> aperture_remove_conflicting_pci_devices(GPU 2). GPU 2 has no overlap
>>> with the efifb on GPU1 but the efifb is killed regardless due to
>>> the unconditional call to sysfb_disable(). The console switches
>>> to dummy and locks up from the user perspective.
>>> This seems unnecessary, as the device is unrelated.
>>> 
> 
> That's a bug indeed but I thought that was already fixed...
Javier Martinez Canillas March 20, 2023, 12:12 p.m. UTC | #5
Samuel Čavoj <samuel@cavoj.net> writes:

[...]

>>>> This call to sysfb_disable() has been causing trouble with regard to
>>>> VFIO. VFIO has been calling aperture_remove_conflicting_pci_devices 
>>>> to
>>>> get rid of any console drivers (d173780620792c) using the device in
>>>> question, but now even unrelated drivers are getting killed. Example
>>>> situation:
>>> 
>>> Which drivers do you use?
>
> This happens with either no drivers loaded or the proprietary nvidia
> driver. Nouveau is fine as it doesn't rely on efifb but brings its own.
>

Which is what all DRM drivers should do. If they want to make sure that a
fbdev will be present after the DRM driver probes, then should register an
emulated fbdev.

There was an attempt to workaround that in [0], in particular patch [1]
but that effort was not continued since the only DRM driver that would be
affected is the Nvidia proprietary driver that relies on efifb/simpledrm
to have a VT.

[0]: https://patchwork.kernel.org/project/dri-devel/list/?series=711019&archive=both
[1]: https://patchwork.kernel.org/project/dri-devel/patch/20230111154112.90575-11-daniel.vetter@ffwll.ch/
Samuel Čavoj March 28, 2023, 3:19 p.m. UTC | #6
On 2023-03-20 13:12, Javier Martinez Canillas wrote:
> Samuel Čavoj <samuel@cavoj.net> writes:
> 
> [...]
> 
>>>>> This call to sysfb_disable() has been causing trouble with regard 
>>>>> to
>>>>> VFIO. VFIO has been calling aperture_remove_conflicting_pci_devices
>>>>> to
>>>>> get rid of any console drivers (d173780620792c) using the device in
>>>>> question, but now even unrelated drivers are getting killed. 
>>>>> Example
>>>>> situation:
>>>> 
>>>> Which drivers do you use?
>> 
>> This happens with either no drivers loaded or the proprietary nvidia
>> driver. Nouveau is fine as it doesn't rely on efifb but brings its 
>> own.
>> 
> 
> Which is what all DRM drivers should do. If they want to make sure that 
> a
> fbdev will be present after the DRM driver probes, then should register 
> an
> emulated fbdev.

I don't see how this is specific to Nvidia or DRM drivers.

The efifb is killed if vfio-pci (or another driver which uses the
aperture system to remove conflicting drivers) is bound to ANY pci
device, regardless of whether it's nvidia's fault for not implementing
a framebuffer. Fair enough, I agree that they should, but
I for one expect my efifb to not die at a random time
when a random unrelated driver does a random thing with another
unrelated GPU.

Or is the efifb considered a stop-gap solution the only purpose of
which is early boot--before another GPU driver is loaded?

> 
> There was an attempt to workaround that in [0], in particular patch [1]
> but that effort was not continued since the only DRM driver that would 
> be
> affected is the Nvidia proprietary driver that relies on 
> efifb/simpledrm
> to have a VT.
> 
> [0]: 
> https://patchwork.kernel.org/project/dri-devel/list/?series=711019&archive=both
> [1]: 
> https://patchwork.kernel.org/project/dri-devel/patch/20230111154112.90575-11-daniel.vetter@ffwll.ch/
Javier Martinez Canillas April 4, 2023, 11:36 a.m. UTC | #7
Samuel Čavoj <samuel@cavoj.net> writes:

Hello Samuel,

> On 2023-03-20 13:12, Javier Martinez Canillas wrote:
>> Samuel Čavoj <samuel@cavoj.net> writes:
>> 
>> [...]
>> 
>>>>>> This call to sysfb_disable() has been causing trouble with regard 
>>>>>> to
>>>>>> VFIO. VFIO has been calling aperture_remove_conflicting_pci_devices
>>>>>> to
>>>>>> get rid of any console drivers (d173780620792c) using the device in
>>>>>> question, but now even unrelated drivers are getting killed. 
>>>>>> Example
>>>>>> situation:
>>>>> 
>>>>> Which drivers do you use?
>>> 
>>> This happens with either no drivers loaded or the proprietary nvidia
>>> driver. Nouveau is fine as it doesn't rely on efifb but brings its 
>>> own.
>>> 
>> 
>> Which is what all DRM drivers should do. If they want to make sure that 
>> a
>> fbdev will be present after the DRM driver probes, then should register 
>> an
>> emulated fbdev.
>
> I don't see how this is specific to Nvidia or DRM drivers.
>

Not specific to Nvidia per se but as mentioned it only affected Nvidia due
that driver relying on a different graphics driver to get a VT console.

> The efifb is killed if vfio-pci (or another driver which uses the
> aperture system to remove conflicting drivers) is bound to ANY pci
> device, regardless of whether it's nvidia's fault for not implementing
> a framebuffer. Fair enough, I agree that they should, but
> I for one expect my efifb to not die at a random time
> when a random unrelated driver does a random thing with another
> unrelated GPU.
>

There was a patch series to address that:

https://patchwork.kernel.org/project/dri-devel/list/?series=711019&archive=both

In particular, this patch:

https://patchwork.kernel.org/project/dri-devel/patch/20230111154112.90575-11-daniel.vetter@ffwll.ch/

> Or is the efifb considered a stop-gap solution the only purpose of
> which is early boot--before another GPU driver is loaded?
>

All the firmware-provided graphics drivers are really a best effort IMO,
that is something only to be used to get early video output and any in the
case of "nomodeset" (i.e: some distros have a "Safe graphics mode" boot
entry that prevents DRM drivers to be loaded but used for troubleshooting.

But as soon as a real DRM driver is probed (either in the host or a guest
when the device is passed-through), I believe that is very likely that it
won't work anymore. In other words, is not a robust way to get output and
is just a best effort.
diff mbox series

Patch

diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c
index f42a0d8bc211..101e13c2cf41 100644
--- a/drivers/video/aperture.c
+++ b/drivers/video/aperture.c
@@ -8,6 +8,7 @@ 
 #include <linux/pci.h>
 #include <linux/platform_device.h>
 #include <linux/slab.h>
+#include <linux/sysfb.h>
 #include <linux/types.h>
 #include <linux/vgaarb.h>
 
@@ -286,7 +287,20 @@  int aperture_remove_conflicting_devices(resource_size_t base, resource_size_t si
 #if IS_REACHABLE(CONFIG_FB)
 	struct apertures_struct *a;
 	int ret;
+#endif
+
+	/*
+	 * If a driver asked to unregister a platform device registered by
+	 * sysfb, then can be assumed that this is a driver for a display
+	 * that is set up by the system firmware and has a generic driver.
+	 *
+	 * Drivers for devices that don't have a generic driver will never
+	 * ask for this, so let's assume that a real driver for the display
+	 * was already probed and prevent sysfb to register devices later.
+	 */
+	sysfb_disable();
 
+#if IS_REACHABLE(CONFIG_FB)
 	a = alloc_apertures(1);
 	if (!a)
 		return -ENOMEM;
diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
index 877de85309f3..3ee3ea018245 100644
--- a/drivers/video/fbdev/core/fbmem.c
+++ b/drivers/video/fbdev/core/fbmem.c
@@ -19,7 +19,6 @@ 
 #include <linux/kernel.h>
 #include <linux/major.h>
 #include <linux/slab.h>
-#include <linux/sysfb.h>
 #include <linux/mm.h>
 #include <linux/mman.h>
 #include <linux/vt.h>
@@ -1765,17 +1764,6 @@  int remove_conflicting_framebuffers(struct apertures_struct *a,
 		do_free = true;
 	}
 
-	/*
-	 * If a driver asked to unregister a platform device registered by
-	 * sysfb, then can be assumed that this is a driver for a display
-	 * that is set up by the system firmware and has a generic driver.
-	 *
-	 * Drivers for devices that don't have a generic driver will never
-	 * ask for this, so let's assume that a real driver for the display
-	 * was already probed and prevent sysfb to register devices later.
-	 */
-	sysfb_disable();
-
 	mutex_lock(&registration_lock);
 	do_remove_conflicting_framebuffers(a, name, primary);
 	mutex_unlock(&registration_lock);