mbox series

[0/3] drm: omapdrm: Fix excessive GEM buffers DMM/CMA usage

Message ID 1642587791-13222-1-git-send-email-ivo.g.dimitrov.75@gmail.com
Headers show
Series drm: omapdrm: Fix excessive GEM buffers DMM/CMA usage | expand

Message

Ivaylo Dimitrov Jan. 19, 2022, 10:23 a.m. UTC
This patch series fixes excessive DMM or CMA usage of GEM buffers leading to
various runtime allocation failures. The series enables daily usage of devices
without exausting limited resources like CMA or DMM space if GPU rendering is
needed.

The first patch doesn't bring any functional changes, it just moves some
TILER/DMM related code to a separate function, to simplify the review of the
next two patches.

The second patch allows off-CPU rendering to non-scanout buffers. Without that
patch, it is basically impossible to use the driver allocated GEM buffers on
OMAP3 for anything else but a basic CPU rendered examples as if we want GPU
rendering, we must allocate buffers as scanout buffers, which are CMA allocated.
CMA soon gets fragmented and we start seeing allocation failures. Such failres
in Xorg cannot be handeled gracefully, so the system is basically unusable.

Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get
fragmented with time, leading to allocation failures.

Series were tested on Motolola Droid4 and Nokia N900, with OMAP DDX and
PVR EXA from https://github.com/maemo-leste/xf86-video-omap

Ivaylo Dimitrov (3):
  drm: omapdrm: simplify omap_gem_pin
  drm: omapdrm: Support exporting of non-contiguous GEM BOs
  drm: omapdrm: Do no allocate non-scanout GEMs through DMM/TILER

 drivers/gpu/drm/omapdrm/omap_gem.c        | 198 +++++++++++++++++-------------
 drivers/gpu/drm/omapdrm/omap_gem.h        |   3 +-
 drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c |   5 +-
 3 files changed, 116 insertions(+), 90 deletions(-)

Comments

Ivaylo Dimitrov Feb. 14, 2022, 7:08 a.m. UTC | #1
gentle ping

On 19.01.22 г. 12:23 ч., Ivaylo Dimitrov wrote:
> This patch series fixes excessive DMM or CMA usage of GEM buffers leading to
> various runtime allocation failures. The series enables daily usage of devices
> without exausting limited resources like CMA or DMM space if GPU rendering is
> needed.
> 
> The first patch doesn't bring any functional changes, it just moves some
> TILER/DMM related code to a separate function, to simplify the review of the
> next two patches.
> 
> The second patch allows off-CPU rendering to non-scanout buffers. Without that
> patch, it is basically impossible to use the driver allocated GEM buffers on
> OMAP3 for anything else but a basic CPU rendered examples as if we want GPU
> rendering, we must allocate buffers as scanout buffers, which are CMA allocated.
> CMA soon gets fragmented and we start seeing allocation failures. Such failres
> in Xorg cannot be handeled gracefully, so the system is basically unusable.
> 
> Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get
> fragmented with time, leading to allocation failures.
> 
> Series were tested on Motolola Droid4 and Nokia N900, with OMAP DDX and
> PVR EXA from https://github.com/maemo-leste/xf86-video-omap
> 
> Ivaylo Dimitrov (3):
>    drm: omapdrm: simplify omap_gem_pin
>    drm: omapdrm: Support exporting of non-contiguous GEM BOs
>    drm: omapdrm: Do no allocate non-scanout GEMs through DMM/TILER
> 
>   drivers/gpu/drm/omapdrm/omap_gem.c        | 198 +++++++++++++++++-------------
>   drivers/gpu/drm/omapdrm/omap_gem.h        |   3 +-
>   drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c |   5 +-
>   3 files changed, 116 insertions(+), 90 deletions(-)
>
Thomas Zimmermann Feb. 16, 2022, 8:10 a.m. UTC | #2
Hi

Am 19.01.22 um 11:23 schrieb Ivaylo Dimitrov:
> This patch series fixes excessive DMM or CMA usage of GEM buffers leading to
> various runtime allocation failures. The series enables daily usage of devices
> without exausting limited resources like CMA or DMM space if GPU rendering is
> needed.
> 
> The first patch doesn't bring any functional changes, it just moves some
> TILER/DMM related code to a separate function, to simplify the review of the
> next two patches.
> 
> The second patch allows off-CPU rendering to non-scanout buffers. Without that
> patch, it is basically impossible to use the driver allocated GEM buffers on
> OMAP3 for anything else but a basic CPU rendered examples as if we want GPU
> rendering, we must allocate buffers as scanout buffers, which are CMA allocated.
> CMA soon gets fragmented and we start seeing allocation failures. Such failres
> in Xorg cannot be handeled gracefully, so the system is basically unusable.
> 
> Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get
> fragmented with time, leading to allocation failures.
> 
> Series were tested on Motolola Droid4 and Nokia N900, with OMAP DDX and
> PVR EXA from https://github.com/maemo-leste/xf86-video-omap
> 
> Ivaylo Dimitrov (3):
>    drm: omapdrm: simplify omap_gem_pin
>    drm: omapdrm: Support exporting of non-contiguous GEM BOs
>    drm: omapdrm: Do no allocate non-scanout GEMs through DMM/TILER

With the little expertise I have with omapdrm:

Acked-by: Thomas Zimmermann <tzimmermann@suse.de>

> 
>   drivers/gpu/drm/omapdrm/omap_gem.c        | 198 +++++++++++++++++-------------
>   drivers/gpu/drm/omapdrm/omap_gem.h        |   3 +-
>   drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c |   5 +-
>   3 files changed, 116 insertions(+), 90 deletions(-)
>
Tomi Valkeinen Feb. 17, 2022, 1:03 p.m. UTC | #3
Hi Ivaylo,

On 19/01/2022 12:23, Ivaylo Dimitrov wrote:
> This patch series fixes excessive DMM or CMA usage of GEM buffers leading to
> various runtime allocation failures. The series enables daily usage of devices
> without exausting limited resources like CMA or DMM space if GPU rendering is
> needed.
> 
> The first patch doesn't bring any functional changes, it just moves some
> TILER/DMM related code to a separate function, to simplify the review of the
> next two patches.
> 
> The second patch allows off-CPU rendering to non-scanout buffers. Without that
> patch, it is basically impossible to use the driver allocated GEM buffers on
> OMAP3 for anything else but a basic CPU rendered examples as if we want GPU
> rendering, we must allocate buffers as scanout buffers, which are CMA allocated.
> CMA soon gets fragmented and we start seeing allocation failures. Such failres
> in Xorg cannot be handeled gracefully, so the system is basically unusable.
> 
> Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get
> fragmented with time, leading to allocation failures.

I think this is just hacking around the problem. The problem is that 
omapdrm is being used by some as a generic buffer allocator. Those users 
should be changed to use a their own allocator or a generic allocator. 
And we could then drop the OMAP_BO_SCANOUT flag, as all buffers would be 
scanout buffers.

Or do we have a regression in the driver? My understanding is that this 
has never really worked.

  Tomi
Ivaylo Dimitrov Feb. 17, 2022, 4:21 p.m. UTC | #4
Hi Tomi,

On 17.02.22 г. 15:03 ч., Tomi Valkeinen wrote:
> Hi Ivaylo,
> 
> On 19/01/2022 12:23, Ivaylo Dimitrov wrote:
>> This patch series fixes excessive DMM or CMA usage of GEM buffers 
>> leading to
>> various runtime allocation failures. The series enables daily usage of 
>> devices
>> without exausting limited resources like CMA or DMM space if GPU 
>> rendering is
>> needed.
>>
>> The first patch doesn't bring any functional changes, it just moves some
>> TILER/DMM related code to a separate function, to simplify the review 
>> of the
>> next two patches.
>>
>> The second patch allows off-CPU rendering to non-scanout buffers. 
>> Without that
>> patch, it is basically impossible to use the driver allocated GEM 
>> buffers on
>> OMAP3 for anything else but a basic CPU rendered examples as if we 
>> want GPU
>> rendering, we must allocate buffers as scanout buffers, which are CMA 
>> allocated.
>> CMA soon gets fragmented and we start seeing allocation failures. Such 
>> failres
>> in Xorg cannot be handeled gracefully, so the system is basically 
>> unusable.
>>
>> Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get
>> fragmented with time, leading to allocation failures.
> 
> I think this is just hacking around the problem. The problem is that 
> omapdrm is being used by some as a generic buffer allocator. Those users 

Well, the user of omap_bo interface I know is xf86-video-omap. Unless if 
by users you mean 'kernel users' which I know none.

I think that if 'we' are to teach xorg omap DDX (or any other user in 
that regard) to use GPU driver allocator for non-scanout buffers and 
omapdrm for scanout, it will become a mess. Not impossible though, just 
way more complicated than the $series. Also, why do omapdrm allow 
allocation of non-linear buffers and CPU (userspace) access to them, but 
refuses to export them to kernel drivers? Isn't that the whole point of 
DMABUF stuff? This is not consistent to me. The series fixes that 
inconsistency, nothing more.

> should be changed to use a their own allocator or a generic allocator. 

SGX driver/userspace has and uses its own allocator, however, I think 
there is more than that - what about TILER/VRFB? Do you say that SGX 
userspace shall be smart enough to requests TILER buffers from omapdrm 
when scanout buffer is requested and use its own allocator when not?

Actually I was thinking about something like that, and it is achievable 
now we have:

https://github.com/maemo-leste/sgx-ddk-um/blob/master/dbm/dbm.c (REed 
SGX 1.17 ddk gbm backend)

> And we could then drop the OMAP_BO_SCANOUT flag, as all buffers would be 
> scanout buffers.
> 

And what about OMAP_BO_TILED_XX stuff? To me this is even more of a 
hack, but it is what it is.

Do I get it correctly that you want to get rid of omap_bo_new/_tiled and 
have only dumb buffers available in omapdrm? TBH this would be great, 
however I still don't see how a TILER/VRFB buffer would be allocated, 
given that flags in drm_mode_create_dumb is not used anywhere in the 
kernel(AFAIK). Unless all scanout buffers are allocated through 
TILER/VRFB (which is a good idea IMO).

> Or do we have a regression in the driver? My understanding is that this 
> has never really worked.
> 

There are couple of patches in omapdrm that change around BO flags and 
their meaning so I think there is a regression, as the same 
userspace/DDX on linux 5.9 results in only 2 linear buffers being 
allocated, but as SGX driver has different version as well, I can't be 
100% sure without going through a lengthy assessment of SGX 
driver/omapdrm code and patches since 5.9. Which I am not going to do as 
I don't see what the benefit will be.

Please consider this patch series as a fix to an inconsistency, as it is 
merely that, it does not really bring any new functionality in terms of 
what is allocated.

Thanks and regards,
Ivo
Tomi Valkeinen March 8, 2022, 8:51 a.m. UTC | #5
On 17/02/2022 18:21, Ivaylo Dimitrov wrote:
> Hi Tomi,
> 
> On 17.02.22 г. 15:03 ч., Tomi Valkeinen wrote:
>> Hi Ivaylo,
>>
>> On 19/01/2022 12:23, Ivaylo Dimitrov wrote:
>>> This patch series fixes excessive DMM or CMA usage of GEM buffers 
>>> leading to
>>> various runtime allocation failures. The series enables daily usage 
>>> of devices
>>> without exausting limited resources like CMA or DMM space if GPU 
>>> rendering is
>>> needed.
>>>
>>> The first patch doesn't bring any functional changes, it just moves some
>>> TILER/DMM related code to a separate function, to simplify the review 
>>> of the
>>> next two patches.
>>>
>>> The second patch allows off-CPU rendering to non-scanout buffers. 
>>> Without that
>>> patch, it is basically impossible to use the driver allocated GEM 
>>> buffers on
>>> OMAP3 for anything else but a basic CPU rendered examples as if we 
>>> want GPU
>>> rendering, we must allocate buffers as scanout buffers, which are CMA 
>>> allocated.
>>> CMA soon gets fragmented and we start seeing allocation failures. 
>>> Such failres
>>> in Xorg cannot be handeled gracefully, so the system is basically 
>>> unusable.
>>>
>>> Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get
>>> fragmented with time, leading to allocation failures.
>>
>> I think this is just hacking around the problem. The problem is that 
>> omapdrm is being used by some as a generic buffer allocator. Those users 
> 
> Well, the user of omap_bo interface I know is xf86-video-omap. Unless if 
> by users you mean 'kernel users' which I know none.
> 
> I think that if 'we' are to teach xorg omap DDX (or any other user in 
> that regard) to use GPU driver allocator for non-scanout buffers and 
> omapdrm for scanout, it will become a mess. Not impossible though, just 
> way more complicated than the $series. Also, why do omapdrm allow 
> allocation of non-linear buffers and CPU (userspace) access to them, but 
> refuses to export them to kernel drivers? Isn't that the whole point of 
> DMABUF stuff? This is not consistent to me. The series fixes that 
> inconsistency, nothing more.
> 
>> should be changed to use a their own allocator or a generic allocator. 
> 
> SGX driver/userspace has and uses its own allocator, however, I think 
> there is more than that - what about TILER/VRFB? Do you say that SGX 
> userspace shall be smart enough to requests TILER buffers from omapdrm 
> when scanout buffer is requested and use its own allocator when not?

All I'm saying is that omapdrm should not support allocating buffers 
that are not usable by the omapdrm hardware. It doesn't make any sense.

> Actually I was thinking about something like that, and it is achievable 
> now we have:
> 
> https://github.com/maemo-leste/sgx-ddk-um/blob/master/dbm/dbm.c (REed 
> SGX 1.17 ddk gbm backend)
> 
>> And we could then drop the OMAP_BO_SCANOUT flag, as all buffers would 
>> be scanout buffers.
>>
> 
> And what about OMAP_BO_TILED_XX stuff? To me this is even more of a 
> hack, but it is what it is.

Yes, I agree, I don't think those OMAP_BO_TILED_* values should be 
exposed to userspace. But I also agree to the "it is what it is" =).

> Do I get it correctly that you want to get rid of omap_bo_new/_tiled and 
> have only dumb buffers available in omapdrm? TBH this would be great, 
> however I still don't see how a TILER/VRFB buffer would be allocated, 
> given that flags in drm_mode_create_dumb is not used anywhere in the 
> kernel(AFAIK). Unless all scanout buffers are allocated through 
> TILER/VRFB (which is a good idea IMO).

We can't get rid of those as they're userspace API.

>> Or do we have a regression in the driver? My understanding is that 
>> this has never really worked.
>>
> 
> There are couple of patches in omapdrm that change around BO flags and 
> their meaning so I think there is a regression, as the same 
> userspace/DDX on linux 5.9 results in only 2 linear buffers being 
> allocated, but as SGX driver has different version as well, I can't be 
> 100% sure without going through a lengthy assessment of SGX 
> driver/omapdrm code and patches since 5.9. Which I am not going to do as 
> I don't see what the benefit will be.
> 
> Please consider this patch series as a fix to an inconsistency, as it is 
> merely that, it does not really bring any new functionality in terms of 
> what is allocated.

I've considered, and I think I agree. The design of omapdrm + tiler is 
broken in my opinion, but it's there, it has userspace APIs, and it's 
all old code. It's probably not worth the effort to try to clean it up, 
while still somehow keeping the old userspace working.

I've had these patchesin my work branch for a while and I haven't seen 
any issues. I'll keep them there for a bit longer and I'll look at these 
patches a bit more, but I think I'll merge them at some point.

  Tomi
Ivaylo Dimitrov March 28, 2022, 3:30 p.m. UTC | #6
Hi,

On 28.03.22 г. 12:46 ч., Tomi Valkeinen wrote:
> Hi,
> 
> On 19/01/2022 12:23, Ivaylo Dimitrov wrote:
>> This patch series fixes excessive DMM or CMA usage of GEM buffers 
>> leading to
>> various runtime allocation failures. The series enables daily usage of 
>> devices
>> without exausting limited resources like CMA or DMM space if GPU 
>> rendering is
>> needed.
>>
>> The first patch doesn't bring any functional changes, it just moves some
>> TILER/DMM related code to a separate function, to simplify the review 
>> of the
>> next two patches.
>>
>> The second patch allows off-CPU rendering to non-scanout buffers. 
>> Without that
>> patch, it is basically impossible to use the driver allocated GEM 
>> buffers on
>> OMAP3 for anything else but a basic CPU rendered examples as if we 
>> want GPU
>> rendering, we must allocate buffers as scanout buffers, which are CMA 
>> allocated.
>> CMA soon gets fragmented and we start seeing allocation failures. Such 
>> failres
>> in Xorg cannot be handeled gracefully, so the system is basically 
>> unusable.
>>
>> Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get
>> fragmented with time, leading to allocation failures.
>>
>> Series were tested on Motolola Droid4 and Nokia N900, with OMAP DDX and
>> PVR EXA from https://github.com/maemo-leste/xf86-video-omap
>>
>> Ivaylo Dimitrov (3):
>>    drm: omapdrm: simplify omap_gem_pin
>>    drm: omapdrm: Support exporting of non-contiguous GEM BOs
>>    drm: omapdrm: Do no allocate non-scanout GEMs through DMM/TILER
>>
>>   drivers/gpu/drm/omapdrm/omap_gem.c        | 198 
>> +++++++++++++++++-------------
>>   drivers/gpu/drm/omapdrm/omap_gem.h        |   3 +-
>>   drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c |   5 +-
>>   3 files changed, 116 insertions(+), 90 deletions(-)
>>
> 
> I have pushed this to drm-misc-next.
> 

Great, next is VRFB and TE support for Droid4 panel, as soon as I find 
some spare time :)

Ivo