diff mbox

[v4,02/14] Documentation: media: description of DMABUF importing in V4L2

Message ID 1334332076-28489-3-git-send-email-t.stanislaws@samsung.com
State Superseded, archived
Headers show

Commit Message

Tomasz Stanislawski April 13, 2012, 3:47 p.m. UTC
This patch adds description and usage examples for importing
DMABUF file descriptor in V4L2.

Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 Documentation/DocBook/media/v4l/compat.xml         |    4 +
 Documentation/DocBook/media/v4l/io.xml             |  179 ++++++++++++++++++++
 .../DocBook/media/v4l/vidioc-create-bufs.xml       |    1 +
 Documentation/DocBook/media/v4l/vidioc-qbuf.xml    |   15 ++
 Documentation/DocBook/media/v4l/vidioc-reqbufs.xml |   47 +++---
 5 files changed, 224 insertions(+), 22 deletions(-)

Comments

Laurent Pinchart April 16, 2012, 11:25 p.m. UTC | #1
Hi Tomasz,

Thanks for the patch.

On Friday 13 April 2012 17:47:44 Tomasz Stanislawski wrote:
> This patch adds description and usage examples for importing
> DMABUF file descriptor in V4L2.
> 
> Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

[snip]

> diff --git a/Documentation/DocBook/media/v4l/io.xml
> b/Documentation/DocBook/media/v4l/io.xml index b815929..dc5979d 100644
> --- a/Documentation/DocBook/media/v4l/io.xml
> +++ b/Documentation/DocBook/media/v4l/io.xml
> @@ -472,6 +472,162 @@ rest should be evident.</para>
>        </footnote></para>
>    </section>
> 
> +  <section id="dmabuf">
> +    <title>Streaming I/O (DMA buffer importing)</title>

This section is very similar to the Streaming I/O (User Pointers) section. Do 
you think we should merge the two ? I could handle that if you want.
Tomasz Stanislawski April 19, 2012, 2:32 p.m. UTC | #2
On 04/17/2012 01:25 AM, Laurent Pinchart wrote:
> Hi Tomasz,
> 
> Thanks for the patch.
> 
> On Friday 13 April 2012 17:47:44 Tomasz Stanislawski wrote:
>> This patch adds description and usage examples for importing
>> DMABUF file descriptor in V4L2.
>>
>> Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> 
> [snip]
> 
>> diff --git a/Documentation/DocBook/media/v4l/io.xml
>> b/Documentation/DocBook/media/v4l/io.xml index b815929..dc5979d 100644
>> --- a/Documentation/DocBook/media/v4l/io.xml
>> +++ b/Documentation/DocBook/media/v4l/io.xml
>> @@ -472,6 +472,162 @@ rest should be evident.</para>
>>        </footnote></para>
>>    </section>
>>
>> +  <section id="dmabuf">
>> +    <title>Streaming I/O (DMA buffer importing)</title>
> 
> This section is very similar to the Streaming I/O (User Pointers) section. Do 
> you think we should merge the two ? I could handle that if you want.
> 

Hi Laurent,

One may find similar sentences in MMAP, USERPTR and DMABUF.
Maybe the common parts like description of STREAMON/OFF,
QBUF/DQBUF shuffling should be moved to separate section
like "Streaming" :).

Maybe it is worth to introduce a separate patch for this change.

Frankly, I would prefer to keep the Doc in the current form till
importer support gets merged. Later the Doc could be fixed.

BTW. What is the sense of merging userptr and dmabuf section
if userptr is going to dropped in long-term?

Regards,
Tomasz Stanislawski
Mauro Carvalho Chehab April 19, 2012, 8:36 p.m. UTC | #3
Em 19-04-2012 11:32, Tomasz Stanislawski escreveu:
> On 04/17/2012 01:25 AM, Laurent Pinchart wrote:
>> Hi Tomasz,
>>
>> Thanks for the patch.
>>
>> On Friday 13 April 2012 17:47:44 Tomasz Stanislawski wrote:
>>> This patch adds description and usage examples for importing
>>> DMABUF file descriptor in V4L2.
>>>
>>> Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com>
>>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>>
>> [snip]
>>
>>> diff --git a/Documentation/DocBook/media/v4l/io.xml
>>> b/Documentation/DocBook/media/v4l/io.xml index b815929..dc5979d 100644
>>> --- a/Documentation/DocBook/media/v4l/io.xml
>>> +++ b/Documentation/DocBook/media/v4l/io.xml
>>> @@ -472,6 +472,162 @@ rest should be evident.</para>
>>>        </footnote></para>
>>>    </section>
>>>
>>> +  <section id="dmabuf">
>>> +    <title>Streaming I/O (DMA buffer importing)</title>
>>
>> This section is very similar to the Streaming I/O (User Pointers) section. Do 
>> you think we should merge the two ? I could handle that if you want.
>>
> 
> Hi Laurent,
> 
> One may find similar sentences in MMAP, USERPTR and DMABUF.
> Maybe the common parts like description of STREAMON/OFF,
> QBUF/DQBUF shuffling should be moved to separate section
> like "Streaming" :).
> 
> Maybe it is worth to introduce a separate patch for this change.
> 
> Frankly, I would prefer to keep the Doc in the current form till
> importer support gets merged. Later the Doc could be fixed.
> 
> BTW. What is the sense of merging userptr and dmabuf section
> if userptr is going to dropped in long-term?

I didn't read the rest of the thread, so sorry, if I'm making wrong assumptions...
Am I understanding wrong or are you saying that you want to drop userptr
from V4L2 API in long-term?

> 
> Regards,
> Tomasz Stanislawski
Mauro Carvalho Chehab April 19, 2012, 8:37 p.m. UTC | #4
Em 19-04-2012 11:32, Tomasz Stanislawski escreveu:
 
> Hi Laurent,
> 
> One may find similar sentences in MMAP, USERPTR and DMABUF.
> Maybe the common parts like description of STREAMON/OFF,
> QBUF/DQBUF shuffling should be moved to separate section
> like "Streaming" :).
> 
> Maybe it is worth to introduce a separate patch for this change.
> 
> Frankly, I would prefer to keep the Doc in the current form till
> importer support gets merged. Later the Doc could be fixed.
> 
> BTW. What is the sense of merging userptr and dmabuf section
> if userptr is going to dropped in long-term?

I didn't read yet the rest of the thread, so sorry, if I'm making wrong assumptions...
Am I understanding wrong or are you saying that you want to drop userptr
from V4L2 API in long-term? If so, why?

Regards,
Mauro
Tomasz Stanislawski April 20, 2012, 8:41 a.m. UTC | #5
Hi Mauro,

On 04/19/2012 10:37 PM, Mauro Carvalho Chehab wrote:
> Em 19-04-2012 11:32, Tomasz Stanislawski escreveu:
>  
>> Hi Laurent,
>>
>> One may find similar sentences in MMAP, USERPTR and DMABUF.
>> Maybe the common parts like description of STREAMON/OFF,
>> QBUF/DQBUF shuffling should be moved to separate section
>> like "Streaming" :).
>>
>> Maybe it is worth to introduce a separate patch for this change.
>>
>> Frankly, I would prefer to keep the Doc in the current form till
>> importer support gets merged. Later the Doc could be fixed.
>>
>> BTW. What is the sense of merging userptr and dmabuf section
>> if userptr is going to dropped in long-term?
> 
> I didn't read yet the rest of the thread, so sorry, if I'm making wrong assumptions...
> Am I understanding wrong or are you saying that you want to drop userptr
> from V4L2 API in long-term? If so, why?

Dropping userptr is just some brainstorming idea.
It was found out that userptr is not a good mean
for buffer exchange between to two devices.
The USERPTR simplifies userspace code but introduce
a lot of complexity problems for the kernel drivers
and frameworks.

The problem is that memory mmaped to the userspace may
not be a part of the system memory. It often happens for
devices that use remap_pfn or dma_mmap_* to mmap the
memory to the userspace.

It is was empirically conjured the it is not possible
to access this kind of memory by the other device
without a platform-specific hacks or workarounds.

The DMABUF was introduced to help in such a case.

The basic short-term idea is to drop userptr support for
buffers that are MMAPed by other device.

The userptr will be used for memory allocated using malloc
(anonymous pages) or (maybe) mmaped files. There are of
course cache synchronization problems but there are
a lesser concern.

However this approach will work only for devices that
have its own IOMMU which can be configured to access system
memory. Otherwise, the memory has to copied anyway
to device's own buffers.

Moreover copying a large amount of data should not happen
in the kernel-space.

All the reasons make userptr an unreliable and complex to
implement feature.

So my rough-idea was to remove USERPTR support from kernel
drivers (if possible of course) and to provide an emulation
layer in the userspace code like libv4l2.

Please note that it is only a rough idea. Just brainstorming :)

It is *too early* to start any discussion on this topic.
Especially until DMABUF is mature enough to become a good
alternative for userptr.

Regards,
Tomasz Stanislawski

> 
> Regards,
> Mauro
>
Rémi Denis-Courmont April 20, 2012, 10:56 a.m. UTC | #6
On Fri, 20 Apr 2012 10:41:37 +0200, Tomasz Stanislawski
<t.stanislaws@samsung.com> wrote:
>> Am I understanding wrong or are you saying that you want to drop

userptr
>> from V4L2 API in long-term? If so, why?

> 

> Dropping userptr is just some brainstorming idea.

> It was found out that userptr is not a good mean

> for buffer exchange between to two devices.


I can believe that. But I am also inclined to believe that DMABUF is
targetted at device-to-device transfer, while USERPTR is targetted at
device-to-user (or user-to-device) transfers. Are you saying applications
should use DMABUF and memory map the buffers? Or would you care to explain
how DMABUF addresses the problem space of USERPTR?

> The USERPTR simplifies userspace code but introduce

> a lot of complexity problems for the kernel drivers

> and frameworks.


It is not only a simplification. In some cases, USERPTR is the only I/O
method that supports zero copy in pretty much any circumstance. When the
user cannot reliably predict the maximum number of required buffers,
predicts a value larger than the device will negotiate, or needs buffers to
outlive STREAMOFF (?), MMAP requires memory copying. USERPTR does not.

Now, I do realize that some devices cannot support USERPTR efficiently,
then they should not support USERPTR. But for those devices that can, it
seems quite a nice performance enhancement.

-- 
Rémi Denis-Courmont
Sent from my collocated server
Tomasz Stanislawski April 20, 2012, 12:25 p.m. UTC | #7
Hi Remi,

On 04/20/2012 12:56 PM, Rémi Denis-Courmont wrote:
> On Fri, 20 Apr 2012 10:41:37 +0200, Tomasz Stanislawski
> <t.stanislaws@samsung.com> wrote:
>>> Am I understanding wrong or are you saying that you want to drop
> userptr
>>> from V4L2 API in long-term? If so, why?
>>
>> Dropping userptr is just some brainstorming idea.
>> It was found out that userptr is not a good mean
>> for buffer exchange between to two devices.
> 
> I can believe that. But I am also inclined to believe that DMABUF is
> targetted at device-to-device transfer, while USERPTR is targetted at
> device-to-user (or user-to-device) transfers. Are you saying applications
> should use DMABUF and memory map the buffers?

No. As I sad before: it is *too early* to drop userptr and expect application
to use DMABUF and MMAPs only. This was just some hypothetical idea.

DMABUF is dedicated for  dev-dev transfers. However, looking at the current
speed of appearances of  DMABUF extensions it may be expected that one day
it starts supporting making DMA buffer from a user pointer.
Currently there are already extensions for MMAP and cache synchronization.
Who know what will happen future versions. However these are only
hypothetical issues.

Or would you care to explain
> how DMABUF addresses the problem space of USERPTR?

Allowing to attach a DMABUF to some userptr using some new magic IOCTL.
I think that sooner or later someone will find this feature useful.

> 
>> The USERPTR simplifies userspace code but introduce
>> a lot of complexity problems for the kernel drivers
>> and frameworks.
> 
> It is not only a simplification. In some cases, USERPTR is the only I/O
> method that supports zero copy in pretty much any circumstance.

Only for devices that have its own IOMMU that can access system memory.
Moreover the userptr must come from malloc or be a mmaped file.
The other case are drivers that touch memory using CPU in the kernel
space like VIVI or USB drivers.

> When the user cannot reliably predict the maximum number of required buffers,
> predicts a value larger than the device will negotiate, or needs buffers to
> outlive STREAMOFF (?), MMAP requires memory copying. USERPTR does not.

What does outlive STREAMOFF means in this context?

Anyway, IMO allocation of the buffers at VIDIOC_REQBUFS was not the best idea
because it introduces an allocation overhead for negotiations of the number
of the buffers. An allocation at mmap was to late. There is a need for some
intermediate state between REQBUFS and mmap. The ioctl BUF_PREPARE may help here.

Can you give me an example of a sane application is forced to negotiate a larger
number of buffers than it is actually going to use?

> 
> Now, I do realize that some devices cannot support USERPTR efficiently,
> then they should not support USERPTR. 

The problem is not there is *NO* device that can handle USERPTR reliably.
The can handle USERPTR generated by malloc or page cache (not sure).
Memory mmaped from other devices, frameworks etc may or may not work.
Even if the device has its IOMMU the DMA layer provides no generic way to
transform from one device to the mapping in some other device.

It is done using platform-defendant hacks like extracting PFNs from mappings,
hack-forming them into struct pages or scatterlists, mapping it and hoping
that the memory is not going to release it in some other thread.

The only sure way is to copy data from userptr to MMAP buffer.

But for those devices that can, it
> seems quite a nice performance enhancement.

The userptr has its niches were it works pretty well like Web cams or VIVI.
I am saying that if ever DMABUF becomes a good alternative for USERPTR
than maybe we should consider encouraging dropping USERPTR in the new
drivers as 'obsolete' feature and providing some emulation layer in libv4l2
for legacy applications.

Regards,
Tomasz Stanislawski
Rémi Denis-Courmont April 20, 2012, 1:03 p.m. UTC | #8
On Fri, 20 Apr 2012 14:25:01 +0200, Tomasz Stanislawski
<t.stanislaws@samsung.com> wrote:
>>> The USERPTR simplifies userspace code but introduce

>>> a lot of complexity problems for the kernel drivers

>>> and frameworks.

>> 

>> It is not only a simplification. In some cases, USERPTR is the only I/O

>> method that supports zero copy in pretty much any circumstance.

> 

> Only for devices that have its own IOMMU that can access system memory.


Newer versions of the UVC driver have USERTPR, and simingly gspca seems
too. That is practically all USB capture devices... That might be
irrelevant for a smartphone manufacturer. That is very relevant for desktop
applications.

> Moreover the userptr must come from malloc or be a mmaped file.

> The other case are drivers that touch memory using CPU in the kernel

> space like VIVI or USB drivers.


I'd argue that USB is the most common case of V4L2 on the desktop...

>> When the user cannot reliably predict the maximum number of required

>> buffers, predicts a value larger than the device will negotiate, or

>> needs buffers to outlive STREAMOFF (?), MMAP requires memory copying.

>> USERPTR does not.

> 

> What does outlive STREAMOFF means in this context?


Depending how your multimedia pipeline is built, it is plausible that the
V4L2 source is shutdown (STREAMOFF then close()) before buffers coming from
it are released/destroyed downstream. I might be wrong, but I would expect
that V4L2 MMAP buffers become invalid after STREAMOFF+close()?

> Anyway, IMO allocation of the buffers at VIDIOC_REQBUFS was not the best

> idea because it introduces an allocation overhead for negotiations of

> the number of the buffers. An allocation at mmap was to late. There is a

> need for some intermediate state between REQBUFS and mmap. The ioctl

> BUF_PREPARE may help here.

> 

> Can you give me an example of a sane application is forced to negotiate

a
> larger number of buffers than it is actually going to use?


Outside the embedded world, the application typically does not know what
the latency of the multimedia pipeline is. If the latency is not known, the
number of buffers needed for zero copy cannot be precomputed for REQBUFS,
say:

count = 1 + latency / frame interval.

Even for a trivial analog TV viewer application, lip synchronization
requires picture frames to be bufferred to be long enough to account for
the latency of the audio input, dejitter, filtering and audio output. Those
values are usually not well determined at the time of requesting buffers
from the video capture device. Also the application may want to play nice
with PulseAudio. Then it will get very long audio buffers with very few
audio periods... more latency.

It gets harder or outright impossible for frameworks dealing with
complicated or arbitrary pipelines such as LibVLC or gstreamer. There is
far too much unpredictability and variability downstream of the V4L2 source
to estimate latency, and infer the number of buffers needed.

>> Now, I do realize that some devices cannot support USERPTR efficiently,

>> then they should not support USERPTR. 

> 

> The problem is not there is *NO* device that can handle USERPTR

reliably.
> The can handle USERPTR generated by malloc or page cache (not sure).

> Memory mmaped from other devices, frameworks etc may or may not work.

> Even if the device has its IOMMU the DMA layer provides no generic way

to
> transform from one device to the mapping in some other device.


I'm not saying that USERPTR should replace DMABUF. I'm saying USERPTR has
advantages over MMAP that DMABUF does not seem to cover as yet (if only
libv4l2 would not inhibit USERPTR...).

I'm definitely not saying that applications should rely on USERPTR being
supported. We agree that not all devices can support USERPTR.

> The userptr has its niches were it works pretty well like Web cams or

VIVI.
> I am saying that if ever DMABUF becomes a good alternative for USERPTR

> than maybe we should consider encouraging dropping USERPTR in the new

> drivers as 'obsolete' feature and providing some emulation layer in

libv4l2
> for legacy applications.


Sure.

-- 
Rémi Denis-Courmont
Sent from my collocated server
Mauro Carvalho Chehab April 20, 2012, 1:36 p.m. UTC | #9
Em 20-04-2012 07:56, Rémi Denis-Courmont escreveu:
> On Fri, 20 Apr 2012 10:41:37 +0200, Tomasz Stanislawski
> <t.stanislaws@samsung.com> wrote:
>>> Am I understanding wrong or are you saying that you want to drop
> userptr
>>> from V4L2 API in long-term? If so, why?
>>
>> Dropping userptr is just some brainstorming idea.
>> It was found out that userptr is not a good mean
>> for buffer exchange between to two devices.
> 
> I can believe that. But I am also inclined to believe that DMABUF is
> targetted at device-to-device transfer, while USERPTR is targetted at
> device-to-user (or user-to-device) transfers. Are you saying applications
> should use DMABUF and memory map the buffers? Or would you care to explain
> how DMABUF addresses the problem space of USERPTR?

I agree with Rémi. Userptr were never meant to be used by dev2dev
transfer. The overlay mode were designed for it.

I remember I've pointed it a few times at the mailing list.

The DMABUF is the proper replacement for the overlay mode, and, after
having it fully implemented, we can deprecate and remove the overlay
mode.
> 
>> The USERPTR simplifies userspace code but introduce
>> a lot of complexity problems for the kernel drivers
>> and frameworks.
> 
> It is not only a simplification. In some cases, USERPTR is the only I/O
> method that supports zero copy in pretty much any circumstance. When the
> user cannot reliably predict the maximum number of required buffers,
> predicts a value larger than the device will negotiate, or needs buffers to
> outlive STREAMOFF (?), MMAP requires memory copying. USERPTR does not.

Yes, that's my understand too. USERPTR works helps to
avoid buffer copying.
> 
> Now, I do realize that some devices cannot support USERPTR efficiently,
> then they should not support USERPTR. But for those devices that can, it
> seems quite a nice performance enhancement.

Agreed.

A quick note about that: for USB devices, with the current implementations,
there will always be a copy inside the Kernel, as the USB and other transport
headers should be removed.

For them, the cost of MMAP and USERPTR is the same (not all USB drivers
export USERPTR, because of a limitation at videobuf-vmalloc).

>> The problem is that memory mmaped to the userspace may
>> not be a part of the system memory. It often happens for
>> devices that use remap_pfn or dma_mmap_* to mmap the
>> memory to the userspace.
>> 
>> It is was empirically conjured the it is not possible
>> to access this kind of memory by the other device
>> without a platform-specific hacks or workarounds.

As I warned in the past: USERPTR were never meant to be used 
for dev2dev transfers.

>> 
>> The DMABUF was introduced to help in such a case.
>> 
>> The basic short-term idea is to drop userptr support for
>> buffers that are MMAPed by other device.

You should, instead, just drop userptr support on devices where
DMA scatter/gather is not supported, and migrate all dev2dev
use cases to DMABUF.

>> 
>> The userptr will be used for memory allocated using malloc
>> (anonymous pages) or (maybe) mmaped files. There are of
>> course cache synchronization problems but there are
>> a lesser concern.
>> 
>> However this approach will work only for devices that
>> have its own IOMMU which can be configured to access system
>> memory. Otherwise, the memory has to copied anyway
>> to device's own buffers.
>> 
>> Moreover copying a large amount of data should not happen
>> in the kernel-space.
>> 
>> All the reasons make userptr an unreliable and complex to
>> implement feature.
>> 
>> So my rough-idea was to remove USERPTR support from kernel
>> drivers (if possible of course) and to provide an emulation
>> layer in the userspace code like libv4l2.
>> 
>> Please note that it is only a rough idea. Just brainstorming :)

> It is *too early* to start any discussion on this topic.
> Especially until DMABUF is mature enough to become a good
> alternative for userptr.

Looking at the hole picture, dropping USERPTR would only make 
sense if it is broken on dev2user (or user2dev) transfers.

Dropping its usage on dev2dev transfers makes sense, after having
DMABUF implemented. 

Yet, if some userspace application wants to abuse of USERPTR in order
to use it for dev2dev transfer, there's not much that can be done at 
Kernel level.

It makes sense to put a big warn at the V4L2 Docs telling that this
is not officially supported and can cause all sorts of issues at
the machine/system.

Regards,
Mauro
Mauro Carvalho Chehab April 20, 2012, 2:48 p.m. UTC | #10
Em 20-04-2012 09:25, Tomasz Stanislawski escreveu:
> Hi Remi,

>> Now, I do realize that some devices cannot support USERPTR efficiently,
>> then they should not support USERPTR. 
> 
> The problem is not there is *NO* device that can handle USERPTR reliably.
> The can handle USERPTR generated by malloc or page cache (not sure).
> Memory mmaped from other devices, frameworks etc may or may not work.
> Even if the device has its IOMMU the DMA layer provides no generic way to
> transform from one device to the mapping in some other device.
> 
> It is done using platform-defendant hacks like extracting PFNs from mappings,
> hack-forming them into struct pages or scatterlists, mapping it and hoping
> that the memory is not going to release it in some other thread.
> 
> The only sure way is to copy data from userptr to MMAP buffer.

All you're talking about is related to userptr abuse that happened
on Embedded devices, of using it for something that were never
meant to be used (dev2dev).

While the DMABUF patches aren't applied, there's just one mode defined
at the V4L2 API for dev2dev: overlay mode[1].

Most embedded applications and drivers decided that, instead of using
overlay mode, to abuse of userptr to do dev2dev. As you've pointed,
it was noticed in practice that this sometimes fail.

Yes, such abuse should be dropped, and DMABUF is the right way to
address it.

That doesn't mean that USERPTR should be dropped for the thing it were
originally created: dev2user or user2dev.

Regards,
Mauro

[1] Even so, not all PC motherboards are capable of supporting the overlay mode:
it is known that several chipsets have problems on their DMA engines, 
with causes data losses when a DMA transfer happens without passing through 
the system main memory (PCI2PCI transfers). So, drivers check the PCI quirks 
table to detect if dev2dev is supported, before exposing overlay mode to
userspace.
Laurent Pinchart April 21, 2012, 5:10 p.m. UTC | #11
Hi Rémi,

On Friday 20 April 2012 15:03:17 Rémi Denis-Courmont wrote:
> On Fri, 20 Apr 2012 14:25:01 +0200, Tomasz Stanislawski wrote:
> >>> The USERPTR simplifies userspace code but introduce
> >>> a lot of complexity problems for the kernel drivers
> >>> and frameworks.
> >> 
> >> It is not only a simplification. In some cases, USERPTR is the only I/O
> >> method that supports zero copy in pretty much any circumstance.
> > 
> > Only for devices that have its own IOMMU that can access system memory.
> 
> Newer versions of the UVC driver have USERTPR, and simingly gspca seems
> too. That is practically all USB capture devices... That might be
> irrelevant for a smartphone manufacturer. That is very relevant for desktop
> applications.
> 
> > Moreover the userptr must come from malloc or be a mmaped file.
> > The other case are drivers that touch memory using CPU in the kernel
> > space like VIVI or USB drivers.
> 
> I'd argue that USB is the most common case of V4L2 on the desktop...
> 
> >> When the user cannot reliably predict the maximum number of required
> >> buffers, predicts a value larger than the device will negotiate, or
> >> needs buffers to outlive STREAMOFF (?), MMAP requires memory copying.
> >> USERPTR does not.
> > 
> > What does outlive STREAMOFF means in this context?
> 
> Depending how your multimedia pipeline is built, it is plausible that the
> V4L2 source is shutdown (STREAMOFF then close()) before buffers coming from
> it are released/destroyed downstream. I might be wrong, but I would expect
> that V4L2 MMAP buffers become invalid after STREAMOFF+close()?

If the buffer is mmap()ed to userspace, it will not be freed before being 
munmap()ed.

> > Anyway, IMO allocation of the buffers at VIDIOC_REQBUFS was not the best
> > idea because it introduces an allocation overhead for negotiations of
> > the number of the buffers. An allocation at mmap was to late. There is a
> > need for some intermediate state between REQBUFS and mmap. The ioctl
> > BUF_PREPARE may help here.
> > 
> > Can you give me an example of a sane application is forced to negotiate
> > a larger number of buffers than it is actually going to use?
> 
> Outside the embedded world, the application typically does not know what
> the latency of the multimedia pipeline is. If the latency is not known, the
> number of buffers needed for zero copy cannot be precomputed for REQBUFS,
> say:
> 
> count = 1 + latency / frame interval.
> 
> Even for a trivial analog TV viewer application, lip synchronization
> requires picture frames to be bufferred to be long enough to account for
> the latency of the audio input, dejitter, filtering and audio output. Those
> values are usually not well determined at the time of requesting buffers
> from the video capture device. Also the application may want to play nice
> with PulseAudio. Then it will get very long audio buffers with very few
> audio periods... more latency.
> 
> It gets harder or outright impossible for frameworks dealing with
> complicated or arbitrary pipelines such as LibVLC or gstreamer. There is
> far too much unpredictability and variability downstream of the V4L2 source
> to estimate latency, and infer the number of buffers needed.

If I'm not mistaken VIDIOC_CREATEBUF allows you to create additional buffers 
at runtime. You can thus cope with a latency increase (provided that the 
allocation overhead isn't prohibitive, in which case you're stuck whatever 
method you select). Deleting buffers at runtime is currently not possible 
though.

> >> Now, I do realize that some devices cannot support USERPTR efficiently,
> >> then they should not support USERPTR.
> > 
> > The problem is not there is *NO* device that can handle USERPTR reliably.
> > The can handle USERPTR generated by malloc or page cache (not sure).
> > Memory mmaped from other devices, frameworks etc may or may not work.
> > Even if the device has its IOMMU the DMA layer provides no generic way to
> > transform from one device to the mapping in some other device.
> 
> I'm not saying that USERPTR should replace DMABUF. I'm saying USERPTR has
> advantages over MMAP that DMABUF does not seem to cover as yet (if only
> libv4l2 would not inhibit USERPTR...).
> 
> I'm definitely not saying that applications should rely on USERPTR being
> supported. We agree that not all devices can support USERPTR.
> 
> > The userptr has its niches were it works pretty well like Web cams or
> > VIVI.
> >
> > I am saying that if ever DMABUF becomes a good alternative for USERPTR
> > than maybe we should consider encouraging dropping USERPTR in the new
> > drivers as 'obsolete' feature and providing some emulation layer in
> > libv4l2 for legacy applications.
> 
> Sure.
Marek Szyprowski April 23, 2012, 7:50 a.m. UTC | #12
Hi Mauro,

On Friday, April 20, 2012 3:37 PM Mauro Carvalho Chehab wrote:

(snipped)

> >> So my rough-idea was to remove USERPTR support from kernel
> >> drivers (if possible of course) and to provide an emulation
> >> layer in the userspace code like libv4l2.
> >>
> >> Please note that it is only a rough idea. Just brainstorming :)
> 
> > It is *too early* to start any discussion on this topic.
> > Especially until DMABUF is mature enough to become a good
> > alternative for userptr.
> 
> Looking at the hole picture, dropping USERPTR would only make
> sense if it is broken on dev2user (or user2dev) transfers.
> 
> Dropping its usage on dev2dev transfers makes sense, after having
> DMABUF implemented.
> 
> Yet, if some userspace application wants to abuse of USERPTR in order
> to use it for dev2dev transfer, there's not much that can be done at
> Kernel level.
> 
> It makes sense to put a big warn at the V4L2 Docs telling that this
> is not officially supported and can cause all sorts of issues at
> the machine/system.

Please note that all current drivers which use videobuf/videobuf2-dma-contig
are able to use userptr memory access method only with physically contiguous
memory. This means that in fact they work only buffers, which come from other
devices and dev2dev transfers are the only possibility. malloc()ed memory
buffers are rejected.

Best regards
Mauro Carvalho Chehab April 23, 2012, 2 p.m. UTC | #13
Em 23-04-2012 07:50, Marek Szyprowski escreveu:
> Hi Mauro,
> 
> On Friday, April 20, 2012 3:37 PM Mauro Carvalho Chehab wrote:
> 
> (snipped)
> 
>>>> So my rough-idea was to remove USERPTR support from kernel
>>>> drivers (if possible of course) and to provide an emulation
>>>> layer in the userspace code like libv4l2.
>>>>
>>>> Please note that it is only a rough idea. Just brainstorming :)
>>
>>> It is *too early* to start any discussion on this topic.
>>> Especially until DMABUF is mature enough to become a good
>>> alternative for userptr.
>>
>> Looking at the hole picture, dropping USERPTR would only make
>> sense if it is broken on dev2user (or user2dev) transfers.
>>
>> Dropping its usage on dev2dev transfers makes sense, after having
>> DMABUF implemented.
>>
>> Yet, if some userspace application wants to abuse of USERPTR in order
>> to use it for dev2dev transfer, there's not much that can be done at
>> Kernel level.
>>
>> It makes sense to put a big warn at the V4L2 Docs telling that this
>> is not officially supported and can cause all sorts of issues at
>> the machine/system.
> 
> Please note that all current drivers which use videobuf/videobuf2-dma-contig
> are able to use userptr memory access method only with physically contiguous
> memory. 

Yes.

> This means that in fact they work only buffers, which come from other
> devices and dev2dev transfers are the only possibility. malloc()ed memory
> buffers are rejected.

Fragmented buffers can be detected, at Kernel level, and VB/VB2 can refuse
a fragmented memory when the hardware doesn't support it. However, checking
if the buffer is fragmented is not a safe way to detect that the buffer will
be used by a dev2dev transfer.

If the buffers are allocated very soon just after boot time which malloc(),
or if they use some different way of allocating the buffers (like reducing the max
ram area addressed by the kernel or using CMU or a simila approach), it could be 
possible to use videobuf(1/2)-dma-contig for userptr with user2dev/dev2user
transfers. This is actually used on some cases where this is used (like where 
the capture device only supports contiguous buffers).

If, for some reason, the hardware doesn't support dev2dev transfers on a
reliable way, some other strategy should be used.

Regards,
Mauro
diff mbox

Patch

diff --git a/Documentation/DocBook/media/v4l/compat.xml b/Documentation/DocBook/media/v4l/compat.xml
index bce97c5..2a2083d 100644
--- a/Documentation/DocBook/media/v4l/compat.xml
+++ b/Documentation/DocBook/media/v4l/compat.xml
@@ -2523,6 +2523,10 @@  ioctls.</para>
         <listitem>
 	  <para>Selection API. <xref linkend="selection-api" /></para>
         </listitem>
+        <listitem>
+	  <para>Importing DMABUF file descriptors as a new IO method described
+	  in <xref linkend="dmabuf" />.</para>
+        </listitem>
       </itemizedlist>
     </section>
 
diff --git a/Documentation/DocBook/media/v4l/io.xml b/Documentation/DocBook/media/v4l/io.xml
index b815929..dc5979d 100644
--- a/Documentation/DocBook/media/v4l/io.xml
+++ b/Documentation/DocBook/media/v4l/io.xml
@@ -472,6 +472,162 @@  rest should be evident.</para>
       </footnote></para>
   </section>
 
+  <section id="dmabuf">
+    <title>Streaming I/O (DMA buffer importing)</title>
+
+    <note>
+      <title>Experimental</title>
+      <para>This is an <link linkend="experimental"> experimental </link>
+      interface and may change in the future.</para>
+    </note>
+
+<para>The DMABUF framework provides a generic mean for sharing buffers between
+ multiple devices. Device drivers that support DMABUF can export a DMA buffer
+to userspace as a file descriptor (known as the exporter role), import a DMA
+buffer from userspace using a file descriptor previously exported for a
+different or the same device (known as the importer role), or both. This
+section describes the DMABUF importer role API in V4L2.</para>
+
+<para>Input and output devices support the streaming I/O method when the
+<constant>V4L2_CAP_STREAMING</constant> flag in the
+<structfield>capabilities</structfield> field of &v4l2-capability; returned by
+the &VIDIOC-QUERYCAP; ioctl is set. Whether importing DMA buffers through
+DMABUF file descriptors is supported is determined by calling the
+&VIDIOC-REQBUFS; ioctl with the memory type set to
+<constant>V4L2_MEMORY_DMABUF</constant>.</para>
+
+    <para>This I/O method is dedicated for sharing DMA buffers between V4L and
+other APIs.  Buffers (planes) are allocated by a driver on behalf of the
+application, and exported to the application as file descriptors using an API
+specific to the allocator driver.  Only those file descriptor are exchanged,
+these files and meta-information are passed in &v4l2-buffer; (or in
+&v4l2-plane; in the multi-planar API case).  The driver must be switched into
+DMABUF I/O mode by calling the &VIDIOC-REQBUFS; with the desired buffer type.
+No buffers (planes) are allocated beforehand, consequently they are not indexed
+and cannot be queried like mapped buffers with the
+<constant>VIDIOC_QUERYBUF</constant> ioctl.</para>
+
+    <example>
+      <title>Initiating streaming I/O with DMABUF file descriptors</title>
+
+      <programlisting>
+&v4l2-requestbuffers; reqbuf;
+
+memset (&amp;reqbuf, 0, sizeof (reqbuf));
+reqbuf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
+reqbuf.memory = V4L2_MEMORY_DMABUF;
+
+if (ioctl (fd, &VIDIOC-REQBUFS;, &amp;reqbuf) == -1) {
+	if (errno == EINVAL)
+		printf ("Video capturing or DMABUF streaming is not supported\n");
+	else
+		perror ("VIDIOC_REQBUFS");
+
+	exit (EXIT_FAILURE);
+}
+      </programlisting>
+    </example>
+
+    <para>Buffer (plane) file is passed on the fly with the &VIDIOC-QBUF;
+ioctl. In case of multiplanar buffers, every plane can be associated with a
+different DMABUF descriptor.Although buffers are commonly cycled, applications
+can pass different DMABUF descriptor at each <constant>VIDIOC_QBUF</constant>
+call.</para>
+
+    <example>
+      <title>Queueing DMABUF using single plane API</title>
+
+      <programlisting>
+int buffer_queue(int v4lfd, int index, int dmafd)
+{
+	&v4l2-buffer; buf;
+
+	memset(&amp;buf, 0, sizeof buf);
+	buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
+	buf.memory = V4L2_MEMORY_DMABUF;
+	buf.index = index;
+	buf.m.fd = dmafd;
+
+	if (ioctl (v4lfd, &VIDIOC-QBUF;, &amp;buf) == -1) {
+		perror ("VIDIOC_QBUF");
+		return -1;
+	}
+
+	return 0;
+}
+      </programlisting>
+    </example>
+
+    <example>
+      <title>Queueing DMABUF using multi plane API</title>
+
+      <programlisting>
+int buffer_queue_mp(int v4lfd, int index, int dmafd[], int n_planes)
+{
+	&v4l2-buffer; buf;
+	&v4l2-plane; planes[VIDEO_MAX_PLANES];
+	int i;
+
+	memset(&amp;buf, 0, sizeof buf);
+	buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE;
+	buf.memory = V4L2_MEMORY_DMABUF;
+	buf.index = index;
+	buf.m.planes = planes;
+	buf.length = n_planes;
+
+	memset(&amp;planes, 0, sizeof planes);
+
+	for (i = 0; i &lt; n_planes; ++i)
+		buf.m.planes[i].m.fd = dmafd[i];
+
+	if (ioctl (v4lfd, &VIDIOC-QBUF;, &amp;buf) == -1) {
+		perror ("VIDIOC_QBUF");
+		return -1;
+	}
+
+	return 0;
+}
+      </programlisting>
+    </example>
+
+    <para>Filled or displayed buffers are dequeued with the
+&VIDIOC-DQBUF; ioctl. The driver can unpin the buffer at any
+time between the completion of the DMA and this ioctl. The memory is
+also unpinned when &VIDIOC-STREAMOFF; is called, &VIDIOC-REQBUFS;, or
+when the device is closed.</para>
+
+    <para>For capturing applications it is customary to enqueue a
+number of empty buffers, to start capturing and enter the read loop.
+Here the application waits until a filled buffer can be dequeued, and
+re-enqueues the buffer when the data is no longer needed. Output
+applications fill and enqueue buffers, when enough buffers are stacked
+up output is started. In the write loop, when the application
+runs out of free buffers it must wait until an empty buffer can be
+dequeued and reused. Two methods exist to suspend execution of the
+application until one or more buffers can be dequeued. By default
+<constant>VIDIOC_DQBUF</constant> blocks when no buffer is in the
+outgoing queue. When the <constant>O_NONBLOCK</constant> flag was
+given to the &func-open; function, <constant>VIDIOC_DQBUF</constant>
+returns immediately with an &EAGAIN; when no buffer is available. The
+&func-select; or &func-poll; function are always available.</para>
+
+    <para>To start and stop capturing or output applications call the
+&VIDIOC-STREAMON; and &VIDIOC-STREAMOFF; ioctls. Note that
+<constant>VIDIOC_STREAMOFF</constant> removes all buffers from both queues and
+unlocks/unpins all buffers as a side effect. Since there is no notion of doing
+anything "now" on a multitasking system, if an application needs to synchronize
+with another event it should examine the &v4l2-buffer;
+<structfield>timestamp</structfield> of captured buffers, or set the field
+before enqueuing buffers for output.</para>
+
+    <para>Drivers implementing DMABUF importing I/O must support the
+<constant>VIDIOC_REQBUFS</constant>, <constant>VIDIOC_QBUF</constant>,
+<constant>VIDIOC_DQBUF</constant>, <constant>VIDIOC_STREAMON</constant> and
+<constant>VIDIOC_STREAMOFF</constant> ioctl, the <function>select()</function>
+and <function>poll()</function> function.</para>
+
+  </section>
+
   <section id="async">
     <title>Asynchronous I/O</title>
 
@@ -671,6 +827,14 @@  memory, set by the application. See <xref linkend="userp" /> for details.
 	    <structname>v4l2_buffer</structname> structure.</entry>
 	  </row>
 	  <row>
+	    <entry></entry>
+	    <entry>int</entry>
+	    <entry><structfield>fd</structfield></entry>
+	    <entry>For the single-plane API and when
+<structfield>memory</structfield> is <constant>V4L2_MEMORY_DMABUF</constant> this
+is the file descriptor associated with a DMABUF buffer.</entry>
+	  </row>
+	  <row>
 	    <entry>__u32</entry>
 	    <entry><structfield>length</structfield></entry>
 	    <entry></entry>
@@ -746,6 +910,15 @@  should set this to 0.</entry>
 	      </entry>
 	  </row>
 	  <row>
+	    <entry></entry>
+	    <entry>int</entry>
+	    <entry><structfield>fd</structfield></entry>
+	    <entry>When the memory type in the containing &v4l2-buffer; is
+		<constant>V4L2_MEMORY_DMABUF</constant>, this is a file
+		descriptor associated with a DMABUF buffer, similar to the
+		<structfield>fd</structfield> field in &v4l2-buffer;.</entry>
+	  </row>
+	  <row>
 	    <entry>__u32</entry>
 	    <entry><structfield>data_offset</structfield></entry>
 	    <entry></entry>
@@ -980,6 +1153,12 @@  pointer</link> I/O.</entry>
 	    <entry>3</entry>
 	    <entry>[to do]</entry>
 	  </row>
+	  <row>
+	    <entry><constant>V4L2_MEMORY_DMABUF</constant></entry>
+	    <entry>2</entry>
+	    <entry>The buffer is used for <link linkend="dmabuf">DMA shared
+buffer</link> I/O.</entry>
+	  </row>
 	</tbody>
       </tgroup>
     </table>
diff --git a/Documentation/DocBook/media/v4l/vidioc-create-bufs.xml b/Documentation/DocBook/media/v4l/vidioc-create-bufs.xml
index 73ae8a6..adc92be 100644
--- a/Documentation/DocBook/media/v4l/vidioc-create-bufs.xml
+++ b/Documentation/DocBook/media/v4l/vidioc-create-bufs.xml
@@ -98,6 +98,7 @@  information.</para>
 	    <entry><structfield>memory</structfield></entry>
 	    <entry>Applications set this field to
 <constant>V4L2_MEMORY_MMAP</constant> or
+<constant>V4L2_MEMORY_DMABUF</constant> or
 <constant>V4L2_MEMORY_USERPTR</constant>.</entry>
 	  </row>
 	  <row>
diff --git a/Documentation/DocBook/media/v4l/vidioc-qbuf.xml b/Documentation/DocBook/media/v4l/vidioc-qbuf.xml
index 9caa49a..cb5f5ff 100644
--- a/Documentation/DocBook/media/v4l/vidioc-qbuf.xml
+++ b/Documentation/DocBook/media/v4l/vidioc-qbuf.xml
@@ -112,6 +112,21 @@  they cannot be swapped out to disk. Buffers remain locked until
 dequeued, until the &VIDIOC-STREAMOFF; or &VIDIOC-REQBUFS; ioctl is
 called, or until the device is closed.</para>
 
+    <para>To enqueue a <link linkend="dmabuf">DMABUF</link> buffer applications
+set the <structfield>memory</structfield> field to
+<constant>V4L2_MEMORY_DMABUF</constant> and the <structfield>m.fd</structfield>
+to a file descriptor associated with a DMABUF buffer. When the multi-planar API is
+used and <structfield>m.fd</structfield> of the passed array of &v4l2-plane;
+have to be used instead. When <constant>VIDIOC_QBUF</constant> is called with a
+pointer to this structure the driver sets the
+<constant>V4L2_BUF_FLAG_QUEUED</constant> flag and clears the
+<constant>V4L2_BUF_FLAG_MAPPED</constant> and
+<constant>V4L2_BUF_FLAG_DONE</constant> flags in the
+<structfield>flags</structfield> field, or it returns an error code.  This
+ioctl locks the buffer. Buffers remain locked until dequeued,
+until the &VIDIOC-STREAMOFF; or &VIDIOC-REQBUFS; ioctl is called, or until the
+device is closed.</para>
+
     <para>Applications call the <constant>VIDIOC_DQBUF</constant>
 ioctl to dequeue a filled (capturing) or displayed (output) buffer
 from the driver's outgoing queue. They just set the
diff --git a/Documentation/DocBook/media/v4l/vidioc-reqbufs.xml b/Documentation/DocBook/media/v4l/vidioc-reqbufs.xml
index 7be4b1d..e3e709b 100644
--- a/Documentation/DocBook/media/v4l/vidioc-reqbufs.xml
+++ b/Documentation/DocBook/media/v4l/vidioc-reqbufs.xml
@@ -48,28 +48,30 @@ 
   <refsect1>
     <title>Description</title>
 
-    <para>This ioctl is used to initiate <link linkend="mmap">memory
-mapped</link> or <link linkend="userp">user pointer</link>
-I/O. Memory mapped buffers are located in device memory and must be
-allocated with this ioctl before they can be mapped into the
-application's address space. User buffers are allocated by
-applications themselves, and this ioctl is merely used to switch the
-driver into user pointer I/O mode and to setup some internal structures.</para>
+<para>This ioctl is used to initiate <link linkend="mmap">memory mapped</link>,
+<link linkend="userp">user pointer</link> or <link
+linkend="dmabuf">DMABUF</link> based I/O.  Memory mapped buffers are located in
+device memory and must be allocated with this ioctl before they can be mapped
+into the application's address space. User buffers are allocated by
+applications themselves, and this ioctl is merely used to switch the driver
+into user pointer I/O mode and to setup some internal structures.
+Similarly, DMABUF buffers are allocated by applications through a device
+driver, and this ioctl only configures the driver into DMABUF I/O mode without
+performing any direct allocation.</para>
 
-    <para>To allocate device buffers applications initialize all
-fields of the <structname>v4l2_requestbuffers</structname> structure.
-They set the <structfield>type</structfield> field to the respective
-stream or buffer type, the <structfield>count</structfield> field to
-the desired number of buffers, <structfield>memory</structfield>
-must be set to the requested I/O method and the <structfield>reserved</structfield> array
-must be zeroed. When the ioctl
-is called with a pointer to this structure the driver will attempt to allocate
-the requested number of buffers and it stores the actual number
-allocated in the <structfield>count</structfield> field. It can be
-smaller than the number requested, even zero, when the driver runs out
-of free memory. A larger number is also possible when the driver requires
-more buffers to function correctly. For example video output requires at least two buffers,
-one displayed and one filled by the application.</para>
+    <para>To allocate device buffers applications initialize all fields of the
+<structname>v4l2_requestbuffers</structname> structure.  They set the
+<structfield>type</structfield> field to the respective stream or buffer type,
+the <structfield>count</structfield> field to the desired number of buffers,
+<structfield>memory</structfield> must be set to the requested I/O method and
+the <structfield>reserved</structfield> array must be zeroed. When the ioctl is
+called with a pointer to this structure the driver will attempt to allocate the
+requested number of buffers and it stores the actual number allocated in the
+<structfield>count</structfield> field. It can be smaller than the number
+requested, even zero, when the driver runs out of free memory. A larger number
+is also possible when the driver requires more buffers to function correctly.
+For example video output requires at least two buffers, one displayed and one
+filled by the application.</para>
     <para>When the I/O method is not supported the ioctl
 returns an &EINVAL;.</para>
 
@@ -102,7 +104,8 @@  as the &v4l2-format; <structfield>type</structfield> field. See <xref
 	    <entry>&v4l2-memory;</entry>
 	    <entry><structfield>memory</structfield></entry>
 	    <entry>Applications set this field to
-<constant>V4L2_MEMORY_MMAP</constant> or
+<constant>V4L2_MEMORY_MMAP</constant>,
+<constant>V4L2_MEMORY_DMABUF</constant> or
 <constant>V4L2_MEMORY_USERPTR</constant>.</entry>
 	  </row>
 	  <row>