diff mbox series

[1/2,RESEND] media: v4l2-mem2mem: allow device run without buf

Message ID 20230704040044.681850-2-randy.li@synaptics.com
State New
Headers show
Series Improve V4L2 M2M job scheduler | expand

Commit Message

Hsia-Jun Li July 4, 2023, 4 a.m. UTC
From: Randy Li <ayaka@soulik.info>

For the decoder supports Dynamic Resolution Change,
we don't need to allocate any CAPTURE or graphics buffer
for them at inital CAPTURE setup step.

We need to make the device run or we can't get those
metadata.

Signed-off-by: Randy Li <ayaka@soulik.info>
---
 drivers/media/v4l2-core/v4l2-mem2mem.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Nicolas Dufresne July 7, 2023, 7:14 p.m. UTC | #1
Hi Randy,

Le mardi 04 juillet 2023 à 12:00 +0800, Hsia-Jun Li a écrit :
> From: Randy Li <ayaka@soulik.info>
> 
> For the decoder supports Dynamic Resolution Change,
> we don't need to allocate any CAPTURE or graphics buffer
> for them at inital CAPTURE setup step.
> 
> We need to make the device run or we can't get those
> metadata.
> 
> Signed-off-by: Randy Li <ayaka@soulik.info>
> ---
>  drivers/media/v4l2-core/v4l2-mem2mem.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> index 0cc30397fbad..c771aba42015 100644
> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> @@ -301,8 +301,9 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
>  
>  	dprintk("Trying to schedule a job for m2m_ctx: %p\n", m2m_ctx);
>  
> -	if (!m2m_ctx->out_q_ctx.q.streaming
> -	    || !m2m_ctx->cap_q_ctx.q.streaming) {
> +	if (!(m2m_ctx->out_q_ctx.q.streaming || m2m_ctx->out_q_ctx.buffered)
> +	    || !(m2m_ctx->cap_q_ctx.q.streaming
> +		 || m2m_ctx->cap_q_ctx.buffered)) {

I have a two atches with similar goals in my wave5 tree. It will be easier to
upstream with an actual user, though, I'm probably a month or two away from
submitting this driver again.

https://gitlab.collabora.com/chipsnmedia/kernel/-/commit/ac59eafd5076c4deb3bfe1fb85b3b776586ef3eb
https://gitlab.collabora.com/chipsnmedia/kernel/-/commit/5de4fbe0abb20b8e8d862b654f93e3efeb1ef251

Sebastien and I authored this without giving it much thought, but we believe
this massively simplify our handling of DRC (dynamic resolution change).

The main difference, is that we added ignore_streaming to the ctx, so that
drivers can opt-in the mode of operation. Thinking it would avoid any potential
side effects in drivers that aren't prepared to that. We didn't want to tied it
up to buffered, this is open to discussion of course, we do use buffered on both
queues and use a slightly more advance job_ready function, that take into
account our driver state.

In short, Sebastien and I agree this small change is the right direction, we
simply have a different implementation. I can send it as RFC if one believe its
would be useful now (even without a user).

>  		dprintk("Streaming needs to be on for both queues\n");
>  		return;
>  	}
Sebastian Fricke July 12, 2023, 9:44 a.m. UTC | #2
Hey Tomasz,

On 12.07.2023 09:31, Tomasz Figa wrote:
>On Fri, Jul 07, 2023 at 03:14:23PM -0400, Nicolas Dufresne wrote:
>> Hi Randy,
>>
>> Le mardi 04 juillet 2023 à 12:00 +0800, Hsia-Jun Li a écrit :
>> > From: Randy Li <ayaka@soulik.info>
>> >
>> > For the decoder supports Dynamic Resolution Change,
>> > we don't need to allocate any CAPTURE or graphics buffer
>> > for them at inital CAPTURE setup step.
>> >
>> > We need to make the device run or we can't get those
>> > metadata.
>> >
>> > Signed-off-by: Randy Li <ayaka@soulik.info>
>> > ---
>> >  drivers/media/v4l2-core/v4l2-mem2mem.c | 5 +++--
>> >  1 file changed, 3 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
>> > index 0cc30397fbad..c771aba42015 100644
>> > --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
>> > +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
>> > @@ -301,8 +301,9 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
>> >
>> >  	dprintk("Trying to schedule a job for m2m_ctx: %p\n", m2m_ctx);
>> >
>> > -	if (!m2m_ctx->out_q_ctx.q.streaming
>> > -	    || !m2m_ctx->cap_q_ctx.q.streaming) {
>> > +	if (!(m2m_ctx->out_q_ctx.q.streaming || m2m_ctx->out_q_ctx.buffered)
>> > +	    || !(m2m_ctx->cap_q_ctx.q.streaming
>> > +		 || m2m_ctx->cap_q_ctx.buffered)) {
>>
>> I have a two atches with similar goals in my wave5 tree. It will be easier to
>> upstream with an actual user, though, I'm probably a month or two away from
>> submitting this driver again.
>>
>> https://gitlab.collabora.com/chipsnmedia/kernel/-/commit/ac59eafd5076c4deb3bfe1fb85b3b776586ef3eb
>> https://gitlab.collabora.com/chipsnmedia/kernel/-/commit/5de4fbe0abb20b8e8d862b654f93e3efeb1ef251
>>
>
>While I'm not going to NAK this series or those 2 patches if you send
>them, I'm not really convinced that adding more and more complexity to
>the mem2mem helpers is a good idea, especially since all of those seem
>to be only needed by stateful video decoders.
>
>The mem2mem framework started as a set of helpers to eliminate boiler
>plate from simple drivers that always get 1 CAPTURE and 1 OUTPUT buffer,
>run 1 processing job on them and then return both of the to the userspace
>and I think it should stay like this.
>
>I think we're strongly in need of a stateful video decoder framework that
>would actually address the exact problems that those have rather than
>bending something that wasn't designed with them in mind to work around the
>differences.

Thanks for the feedback.

I have recently discussed how we could approach creating a framework for
the codecs side, with Hans Verkuil and Nicolas Dufresne.

The first step we would have to do is come up with a list of
requirements for that framework and expected future needs, maybe we can
start a public discussion on the mailing list to generate a list like
that.
But all in all this endeavor will probably require quite a bit of time
and effort, do you think we could modify M2M a bit for our use case and
then when we are in the process of creating the new framework, we could
maybe think about simplifying the M2M framework again?

>
>Best regards,
>Tomasz

Greetings,
Sebastian

>
>> Sebastien and I authored this without giving it much thought, but we believe
>> this massively simplify our handling of DRC (dynamic resolution change).
>>
>> The main difference, is that we added ignore_streaming to the ctx, so that
>> drivers can opt-in the mode of operation. Thinking it would avoid any potential
>> side effects in drivers that aren't prepared to that. We didn't want to tied it
>> up to buffered, this is open to discussion of course, we do use buffered on both
>> queues and use a slightly more advance job_ready function, that take into
>> account our driver state.
>>
>> In short, Sebastien and I agree this small change is the right direction, we
>> simply have a different implementation. I can send it as RFC if one believe its
>> would be useful now (even without a user).
>>
>> >  		dprintk("Streaming needs to be on for both queues\n");
>> >  		return;
>> >  	}
>>
Tomasz Figa July 13, 2023, 3:13 a.m. UTC | #3
On Wed, Jul 12, 2023 at 6:44 PM Sebastian Fricke
<sebastian.fricke@collabora.com> wrote:
>
> Hey Tomasz,
>
> On 12.07.2023 09:31, Tomasz Figa wrote:
> >On Fri, Jul 07, 2023 at 03:14:23PM -0400, Nicolas Dufresne wrote:
> >> Hi Randy,
> >>
> >> Le mardi 04 juillet 2023 à 12:00 +0800, Hsia-Jun Li a écrit :
> >> > From: Randy Li <ayaka@soulik.info>
> >> >
> >> > For the decoder supports Dynamic Resolution Change,
> >> > we don't need to allocate any CAPTURE or graphics buffer
> >> > for them at inital CAPTURE setup step.
> >> >
> >> > We need to make the device run or we can't get those
> >> > metadata.
> >> >
> >> > Signed-off-by: Randy Li <ayaka@soulik.info>
> >> > ---
> >> >  drivers/media/v4l2-core/v4l2-mem2mem.c | 5 +++--
> >> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> >> > index 0cc30397fbad..c771aba42015 100644
> >> > --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> >> > +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> >> > @@ -301,8 +301,9 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
> >> >
> >> >    dprintk("Trying to schedule a job for m2m_ctx: %p\n", m2m_ctx);
> >> >
> >> > -  if (!m2m_ctx->out_q_ctx.q.streaming
> >> > -      || !m2m_ctx->cap_q_ctx.q.streaming) {
> >> > +  if (!(m2m_ctx->out_q_ctx.q.streaming || m2m_ctx->out_q_ctx.buffered)
> >> > +      || !(m2m_ctx->cap_q_ctx.q.streaming
> >> > +           || m2m_ctx->cap_q_ctx.buffered)) {
> >>
> >> I have a two atches with similar goals in my wave5 tree. It will be easier to
> >> upstream with an actual user, though, I'm probably a month or two away from
> >> submitting this driver again.
> >>
> >> https://gitlab.collabora.com/chipsnmedia/kernel/-/commit/ac59eafd5076c4deb3bfe1fb85b3b776586ef3eb
> >> https://gitlab.collabora.com/chipsnmedia/kernel/-/commit/5de4fbe0abb20b8e8d862b654f93e3efeb1ef251
> >>
> >
> >While I'm not going to NAK this series or those 2 patches if you send
> >them, I'm not really convinced that adding more and more complexity to
> >the mem2mem helpers is a good idea, especially since all of those seem
> >to be only needed by stateful video decoders.
> >
> >The mem2mem framework started as a set of helpers to eliminate boiler
> >plate from simple drivers that always get 1 CAPTURE and 1 OUTPUT buffer,
> >run 1 processing job on them and then return both of the to the userspace
> >and I think it should stay like this.
> >
> >I think we're strongly in need of a stateful video decoder framework that
> >would actually address the exact problems that those have rather than
> >bending something that wasn't designed with them in mind to work around the
> >differences.
>
> Thanks for the feedback.
>
> I have recently discussed how we could approach creating a framework for
> the codecs side, with Hans Verkuil and Nicolas Dufresne.

That's great to hear, thanks. :)

>
> The first step we would have to do is come up with a list of
> requirements for that framework and expected future needs, maybe we can
> start a public discussion on the mailing list to generate a list like
> that.

Makes sense. Let me CC some ChromeOS folks working on video codec
drivers these days.

> But all in all this endeavor will probably require quite a bit of time
> and effort, do you think we could modify M2M a bit for our use case and
> then when we are in the process of creating the new framework, we could
> maybe think about simplifying the M2M framework again?

Sure, as I said, I'm not NAKing this series.

>
> >
> >Best regards,
> >Tomasz
>
> Greetings,
> Sebastian
>
> >
> >> Sebastien and I authored this without giving it much thought, but we believe
> >> this massively simplify our handling of DRC (dynamic resolution change).
> >>
> >> The main difference, is that we added ignore_streaming to the ctx, so that
> >> drivers can opt-in the mode of operation. Thinking it would avoid any potential
> >> side effects in drivers that aren't prepared to that. We didn't want to tied it
> >> up to buffered, this is open to discussion of course, we do use buffered on both
> >> queues and use a slightly more advance job_ready function, that take into
> >> account our driver state.
> >>
> >> In short, Sebastien and I agree this small change is the right direction, we
> >> simply have a different implementation. I can send it as RFC if one believe its
> >> would be useful now (even without a user).
> >>
> >> >            dprintk("Streaming needs to be on for both queues\n");
> >> >            return;
> >> >    }
> >>
Nicolas Dufresne July 17, 2023, 2 p.m. UTC | #4
Le mercredi 12 juillet 2023 à 09:31 +0000, Tomasz Figa a écrit :
> On Fri, Jul 07, 2023 at 03:14:23PM -0400, Nicolas Dufresne wrote:
> > Hi Randy,
> > 
> > Le mardi 04 juillet 2023 à 12:00 +0800, Hsia-Jun Li a écrit :
> > > From: Randy Li <ayaka@soulik.info>
> > > 
> > > For the decoder supports Dynamic Resolution Change,
> > > we don't need to allocate any CAPTURE or graphics buffer
> > > for them at inital CAPTURE setup step.
> > > 
> > > We need to make the device run or we can't get those
> > > metadata.
> > > 
> > > Signed-off-by: Randy Li <ayaka@soulik.info>
> > > ---
> > >  drivers/media/v4l2-core/v4l2-mem2mem.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > index 0cc30397fbad..c771aba42015 100644
> > > --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > @@ -301,8 +301,9 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
> > >  
> > >  	dprintk("Trying to schedule a job for m2m_ctx: %p\n", m2m_ctx);
> > >  
> > > -	if (!m2m_ctx->out_q_ctx.q.streaming
> > > -	    || !m2m_ctx->cap_q_ctx.q.streaming) {
> > > +	if (!(m2m_ctx->out_q_ctx.q.streaming || m2m_ctx->out_q_ctx.buffered)
> > > +	    || !(m2m_ctx->cap_q_ctx.q.streaming
> > > +		 || m2m_ctx->cap_q_ctx.buffered)) {
> > 
> > I have a two atches with similar goals in my wave5 tree. It will be easier to
> > upstream with an actual user, though, I'm probably a month or two away from
> > submitting this driver again.
> > 
> > https://gitlab.collabora.com/chipsnmedia/kernel/-/commit/ac59eafd5076c4deb3bfe1fb85b3b776586ef3eb
> > https://gitlab.collabora.com/chipsnmedia/kernel/-/commit/5de4fbe0abb20b8e8d862b654f93e3efeb1ef251
> > 
> 
> While I'm not going to NAK this series or those 2 patches if you send
> them, I'm not really convinced that adding more and more complexity to
> the mem2mem helpers is a good idea, especially since all of those seem
> to be only needed by stateful video decoders.
> 
> The mem2mem framework started as a set of helpers to eliminate boiler
> plate from simple drivers that always get 1 CAPTURE and 1 OUTPUT buffer,
> run 1 processing job on them and then return both of the to the userspace
> and I think it should stay like this.

Its a bit late to try and bring that argument. It should have been raised couple
of years ago (before I even started helping with these CODEC). Now that all the
newly written stately decoders uses this framework, it is logical to keep
reducing the boiler plate for these too. In my opinion, the job_ready()
callback, should have been a lot more flexible from the start. And allowing
driver to make it more powerful does not really add that much complexity.

Speaking of complexity, driving the output manually (outside of the job
workqueue) during sequence initialization is a way more complex and risky then
this. Finally, sticking with 1:1 pattern means encoder, detilers, image
enhancement reducing framerate, etc. would all be unwelcome to use this. Which
in short, means no one should even use this.

> 
> I think we're strongly in need of a stateful video decoder framework that
> would actually address the exact problems that those have rather than
> bending something that wasn't designed with them in mind to work around the
> differences.

The bend is already there, of course I'd be happy to help with any new
framework. Specially on modern stateless, were there is a need to do better
scheduling. Just ping me if you have some effort starting, I don't currently
have a budget or bandwidth to write new drivers or port existing drivers them on
a newly written framework.

Nicolas


[...]
Hsia-Jun Li July 21, 2023, 8:56 a.m. UTC | #5
On 7/17/23 22:00, Nicolas Dufresne wrote:
> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
> 
> 
> Le mercredi 12 juillet 2023 à 09:31 +0000, Tomasz Figa a écrit :
>> On Fri, Jul 07, 2023 at 03:14:23PM -0400, Nicolas Dufresne wrote:
>>> Hi Randy,
>>>
>>> Le mardi 04 juillet 2023 à 12:00 +0800, Hsia-Jun Li a écrit :
>>>> From: Randy Li <ayaka@soulik.info>
>>>>
>>>> For the decoder supports Dynamic Resolution Change,
>>>> we don't need to allocate any CAPTURE or graphics buffer
>>>> for them at inital CAPTURE setup step.
>>>>
>>>> We need to make the device run or we can't get those
>>>> metadata.
>>>>
>>>> Signed-off-by: Randy Li <ayaka@soulik.info>
>>>> ---
>>>>   drivers/media/v4l2-core/v4l2-mem2mem.c | 5 +++--
>>>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>> index 0cc30397fbad..c771aba42015 100644
>>>> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>> @@ -301,8 +301,9 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
>>>>
>>>>    dprintk("Trying to schedule a job for m2m_ctx: %p\n", m2m_ctx);
>>>>
>>>> - if (!m2m_ctx->out_q_ctx.q.streaming
>>>> -     || !m2m_ctx->cap_q_ctx.q.streaming) {
>>>> + if (!(m2m_ctx->out_q_ctx.q.streaming || m2m_ctx->out_q_ctx.buffered)
>>>> +     || !(m2m_ctx->cap_q_ctx.q.streaming
>>>> +          || m2m_ctx->cap_q_ctx.buffered)) {
>>>
>>> I have a two atches with similar goals in my wave5 tree. It will be easier to
>>> upstream with an actual user, though, I'm probably a month or two away from
>>> submitting this driver again.
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.collabora.com_chipsnmedia_kernel_-2D_commit_ac59eafd5076c4deb3bfe1fb85b3b776586ef3eb&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=9eWwqueFnh1yZHTW11j-syNVQvema7iBzNQeX1GKUQwXZ9pm6V4HDL_R2tIYKoOw&s=Ez5AyEYFIAJmC_k00IPO_ImzVdLZjr_veRq1bN4RSNg&e=
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.collabora.com_chipsnmedia_kernel_-2D_commit_5de4fbe0abb20b8e8d862b654f93e3efeb1ef251&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=9eWwqueFnh1yZHTW11j-syNVQvema7iBzNQeX1GKUQwXZ9pm6V4HDL_R2tIYKoOw&s=tM81gjNe-bTjpjmidZ1sAhiodMh6npcVJNOhMCi1mPo&e=
>>>
>>
>> While I'm not going to NAK this series or those 2 patches if you send
>> them, I'm not really convinced that adding more and more complexity to
>> the mem2mem helpers is a good idea, especially since all of those seem
>> to be only needed by stateful video decoders.
>>
>> The mem2mem framework started as a set of helpers to eliminate boiler
>> plate from simple drivers that always get 1 CAPTURE and 1 OUTPUT buffer,
>> run 1 processing job on them and then return both of the to the userspace
>> and I think it should stay like this.
> 
> Its a bit late to try and bring that argument. It should have been raised couple
> of years ago (before I even started helping with these CODEC). Now that all the
> newly written stately decoders uses this framework, it is logical to keep
> reducing the boiler plate for these too. In my opinion, the job_ready()
> callback, should have been a lot more flexible from the start. And allowing
> driver to make it more powerful does not really add that much complexity.
> 
> Speaking of complexity, driving the output manually (outside of the job
> workqueue) during sequence initialization is a way more complex and risky then
> this. Finally, sticking with 1:1 pattern means encoder, detilers, image
> enhancement reducing framerate, etc. would all be unwelcome to use this. Which
> in short, means no one should even use this.
> 
I think those things are m2m, but it would be hard to present in current 
m2m framework:
1. N:1 compositor(It may be implemented as a loop running 2:1 compositor).
2. AV1 film gain
3. HDR with dynamic meta data to SDR

The video things fix for m2m model could be just a little less complex 
than ISP or camera pipeline. The only difference is just ISP won't have 
multiple contexts running at the same time.
If we could design a model for the video encoder I think we could solve 
the most of problems.
A video encoder would have:
1. input graphics buffer
2. reconstruction graphics buffer
3. motion vector cache buffer(optional)
4. coded bitstream output
5. encoding statistic report
>>
>> I think we're strongly in need of a stateful video decoder framework that
>> would actually address the exact problems that those have rather than
>> bending something that wasn't designed with them in mind to work around the
>> differences.
> 
> The bend is already there, of course I'd be happy to help with any new
> framework. Specially on modern stateless, were there is a need to do better
> scheduling.
I didn't know the schedule problem about stateless codec, are they 
supposed to be in the job queue when the buffers that DPB requests are 
own by the driver and its registers are prepared except the trigger bit?
  Just ping me if you have some effort starting, I don't currently
> have a budget or bandwidth to write new drivers or port existing drivers them on
> a newly written framework.
> 
> Nicolas
> 
> 
> [...]
Nicolas Dufresne July 21, 2023, 4:22 p.m. UTC | #6
Le vendredi 21 juillet 2023 à 16:56 +0800, Hsia-Jun Li a écrit :
> 
> On 7/17/23 22:00, Nicolas Dufresne wrote:
> > CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
> > 
> > 
> > Le mercredi 12 juillet 2023 à 09:31 +0000, Tomasz Figa a écrit :
> > > On Fri, Jul 07, 2023 at 03:14:23PM -0400, Nicolas Dufresne wrote:
> > > > Hi Randy,
> > > > 
> > > > Le mardi 04 juillet 2023 à 12:00 +0800, Hsia-Jun Li a écrit :
> > > > > From: Randy Li <ayaka@soulik.info>
> > > > > 
> > > > > For the decoder supports Dynamic Resolution Change,
> > > > > we don't need to allocate any CAPTURE or graphics buffer
> > > > > for them at inital CAPTURE setup step.
> > > > > 
> > > > > We need to make the device run or we can't get those
> > > > > metadata.
> > > > > 
> > > > > Signed-off-by: Randy Li <ayaka@soulik.info>
> > > > > ---
> > > > >   drivers/media/v4l2-core/v4l2-mem2mem.c | 5 +++--
> > > > >   1 file changed, 3 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > > > index 0cc30397fbad..c771aba42015 100644
> > > > > --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > > > +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
> > > > > @@ -301,8 +301,9 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
> > > > > 
> > > > >    dprintk("Trying to schedule a job for m2m_ctx: %p\n", m2m_ctx);
> > > > > 
> > > > > - if (!m2m_ctx->out_q_ctx.q.streaming
> > > > > -     || !m2m_ctx->cap_q_ctx.q.streaming) {
> > > > > + if (!(m2m_ctx->out_q_ctx.q.streaming || m2m_ctx->out_q_ctx.buffered)
> > > > > +     || !(m2m_ctx->cap_q_ctx.q.streaming
> > > > > +          || m2m_ctx->cap_q_ctx.buffered)) {
> > > > 
> > > > I have a two atches with similar goals in my wave5 tree. It will be easier to
> > > > upstream with an actual user, though, I'm probably a month or two away from
> > > > submitting this driver again.
> > > > 
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.collabora.com_chipsnmedia_kernel_-2D_commit_ac59eafd5076c4deb3bfe1fb85b3b776586ef3eb&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=9eWwqueFnh1yZHTW11j-syNVQvema7iBzNQeX1GKUQwXZ9pm6V4HDL_R2tIYKoOw&s=Ez5AyEYFIAJmC_k00IPO_ImzVdLZjr_veRq1bN4RSNg&e=
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.collabora.com_chipsnmedia_kernel_-2D_commit_5de4fbe0abb20b8e8d862b654f93e3efeb1ef251&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=9eWwqueFnh1yZHTW11j-syNVQvema7iBzNQeX1GKUQwXZ9pm6V4HDL_R2tIYKoOw&s=tM81gjNe-bTjpjmidZ1sAhiodMh6npcVJNOhMCi1mPo&e=
> > > > 
> > > 
> > > While I'm not going to NAK this series or those 2 patches if you send
> > > them, I'm not really convinced that adding more and more complexity to
> > > the mem2mem helpers is a good idea, especially since all of those seem
> > > to be only needed by stateful video decoders.
> > > 
> > > The mem2mem framework started as a set of helpers to eliminate boiler
> > > plate from simple drivers that always get 1 CAPTURE and 1 OUTPUT buffer,
> > > run 1 processing job on them and then return both of the to the userspace
> > > and I think it should stay like this.
> > 
> > Its a bit late to try and bring that argument. It should have been raised couple
> > of years ago (before I even started helping with these CODEC). Now that all the
> > newly written stately decoders uses this framework, it is logical to keep
> > reducing the boiler plate for these too. In my opinion, the job_ready()
> > callback, should have been a lot more flexible from the start. And allowing
> > driver to make it more powerful does not really add that much complexity.
> > 
> > Speaking of complexity, driving the output manually (outside of the job
> > workqueue) during sequence initialization is a way more complex and risky then
> > this. Finally, sticking with 1:1 pattern means encoder, detilers, image
> > enhancement reducing framerate, etc. would all be unwelcome to use this. Which
> > in short, means no one should even use this.
> > 
> I think those things are m2m, but it would be hard to present in current 
> m2m framework:
> 1. N:1 compositor(It may be implemented as a loop running 2:1 compositor).

Correct, only SRC/DST/BG type of blitters can be supported for compositing,
which is quite limiting. Currently there is no way to make an N:1 M2M, as M2M
instances are implemented at the video node layer, and not at the MC layer. This
is a entirely new subject and API design space to tackle (same goes for 1:N,
like multi scalers, svc decoders etc.).

> 2. AV1 film gain

For AV1/HEVC film grain, it is handle similar to inline converters and scalers.
The driver secretly allocate the reference frames, and post process into the
user visible buffers. It breaks some assumption made by most protected memory
setup though, as not all allocation is user driven, meaning the decoder needs to
know if its secure or not. Secure memory is a also another API design space to
tackle.

> 3. HDR with dynamic meta data to SDR

True, but easy to design around the stateless model. I'm not worried for these.

> 
> The video things fix for m2m model could be just a little less complex 
> than ISP or camera pipeline. The only difference is just ISP won't have 
> multiple contexts running at the same time.

I thought that having the kernel schedule ISP reprocessing jobs (which requires
instances) would be nice. But this can only be solved after we have solved the
N:N use cases of m2m (with multiple instances).

> If we could design a model for the video encoder I think we could solve 
> the most of problems.
> A video encoder would have:
> 1. input graphics buffer
> 2. reconstruction graphics buffer
> 3. motion vector cache buffer(optional)
> 4. coded bitstream output
> 5. encoding statistic report
> > > 
> > > I think we're strongly in need of a stateful video decoder framework that
> > > would actually address the exact problems that those have rather than
> > > bending something that wasn't designed with them in mind to work around the
> > > differences.
> > 
> > The bend is already there, of course I'd be happy to help with any new
> > framework. Specially on modern stateless, were there is a need to do better
> > scheduling.
> I didn't know the schedule problem about stateless codec, are they 
> supposed to be in the job queue when the buffers that DPB requests are 
> own by the driver and its registers are prepared except the trigger bit?

On RK3588 at least, decoder scheduling is going to be complex. There is an even
number of cores, but when you need to decode 8K, you have to pair two cores
(there is a specific set of cores that are to be paired with). We need a decent
scheduling logic to ensure we don't starve 8K decoding session when there is
multiple smaller resolution session on-going.

On MTK, the entropy decoding (LAT) and the reconstruction (CORE) is split. MTK
vcodec is using multiple workqueues to move jobs around, which is clearly
expensive. Also, the life time of a job is not exactly easy to manage.

On RPi HEVC (not upstream yet, but being worked on), the entropy decoding and
reconstruction is done one the same core, but remains 2 concurrent operation.
Does not impose a complex scheduling issue, but it raised the need for a way to
fully utilize such HW.

This is just some examples of complexity for which the current framework is not
that helpful (even though, its not impossible either).

>   Just ping me if you have some effort starting, I don't currently
> > have a budget or bandwidth to write new drivers or port existing drivers them on
> > a newly written framework.
> > 
> > Nicolas
> > 
> > 
> > [...]
>
Randy Li July 24, 2023, 5:29 p.m. UTC | #7
On 2023/7/22 00:22, Nicolas Dufresne wrote:
> Le vendredi 21 juillet 2023 à 16:56 +0800, Hsia-Jun Li a écrit :
>> On 7/17/23 22:00, Nicolas Dufresne wrote:
>>> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
>>>
>>>
>>> Le mercredi 12 juillet 2023 à 09:31 +0000, Tomasz Figa a écrit :
>>>> On Fri, Jul 07, 2023 at 03:14:23PM -0400, Nicolas Dufresne wrote:
>>>>> Hi Randy,
>>>>>
>>>>> Le mardi 04 juillet 2023 à 12:00 +0800, Hsia-Jun Li a écrit :
>>>>>> From: Randy Li <ayaka@soulik.info>
>>>>>>
>>>>>> For the decoder supports Dynamic Resolution Change,
>>>>>> we don't need to allocate any CAPTURE or graphics buffer
>>>>>> for them at inital CAPTURE setup step.
>>>>>>
>>>>>> We need to make the device run or we can't get those
>>>>>> metadata.
>>>>>>
>>>>>> Signed-off-by: Randy Li <ayaka@soulik.info>
>>>>>> ---
>>>>>>    drivers/media/v4l2-core/v4l2-mem2mem.c | 5 +++--
>>>>>>    1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>>>> index 0cc30397fbad..c771aba42015 100644
>>>>>> --- a/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>>>> +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
>>>>>> @@ -301,8 +301,9 @@ static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
>>>>>>
>>>>>>     dprintk("Trying to schedule a job for m2m_ctx: %p\n", m2m_ctx);
>>>>>>
>>>>>> - if (!m2m_ctx->out_q_ctx.q.streaming
>>>>>> -     || !m2m_ctx->cap_q_ctx.q.streaming) {
>>>>>> + if (!(m2m_ctx->out_q_ctx.q.streaming || m2m_ctx->out_q_ctx.buffered)
>>>>>> +     || !(m2m_ctx->cap_q_ctx.q.streaming
>>>>>> +          || m2m_ctx->cap_q_ctx.buffered)) {
>>>>> I have a two atches with similar goals in my wave5 tree. It will be easier to
>>>>> upstream with an actual user, though, I'm probably a month or two away from
>>>>> submitting this driver again.
>>>>>
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.collabora.com_chipsnmedia_kernel_-2D_commit_ac59eafd5076c4deb3bfe1fb85b3b776586ef3eb&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=9eWwqueFnh1yZHTW11j-syNVQvema7iBzNQeX1GKUQwXZ9pm6V4HDL_R2tIYKoOw&s=Ez5AyEYFIAJmC_k00IPO_ImzVdLZjr_veRq1bN4RSNg&e=
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.collabora.com_chipsnmedia_kernel_-2D_commit_5de4fbe0abb20b8e8d862b654f93e3efeb1ef251&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=9eWwqueFnh1yZHTW11j-syNVQvema7iBzNQeX1GKUQwXZ9pm6V4HDL_R2tIYKoOw&s=tM81gjNe-bTjpjmidZ1sAhiodMh6npcVJNOhMCi1mPo&e=
>>>>>
>>>> While I'm not going to NAK this series or those 2 patches if you send
>>>> them, I'm not really convinced that adding more and more complexity to
>>>> the mem2mem helpers is a good idea, especially since all of those seem
>>>> to be only needed by stateful video decoders.
>>>>
>>>> The mem2mem framework started as a set of helpers to eliminate boiler
>>>> plate from simple drivers that always get 1 CAPTURE and 1 OUTPUT buffer,
>>>> run 1 processing job on them and then return both of the to the userspace
>>>> and I think it should stay like this.
>>> Its a bit late to try and bring that argument. It should have been raised couple
>>> of years ago (before I even started helping with these CODEC). Now that all the
>>> newly written stately decoders uses this framework, it is logical to keep
>>> reducing the boiler plate for these too. In my opinion, the job_ready()
>>> callback, should have been a lot more flexible from the start. And allowing
>>> driver to make it more powerful does not really add that much complexity.
>>>
>>> Speaking of complexity, driving the output manually (outside of the job
>>> workqueue) during sequence initialization is a way more complex and risky then
>>> this. Finally, sticking with 1:1 pattern means encoder, detilers, image
>>> enhancement reducing framerate, etc. would all be unwelcome to use this. Which
>>> in short, means no one should even use this.
>>>
>> I think those things are m2m, but it would be hard to present in current
>> m2m framework:
>> 1. N:1 compositor(It may be implemented as a loop running 2:1 compositor).
> Correct, only SRC/DST/BG type of blitters can be supported for compositing,
> which is quite limiting. Currently there is no way to make an N:1 M2M, as M2M
> instances are implemented at the video node layer, and not at the MC layer. This
> is a entirely new subject and API design space to tackle (same goes for 1:N,
> like multi scalers, svc decoders etc.).
SVC case is the one I mention in the talk, although the major problem 
may only happens to SVC-S.
>
>> 2. AV1 film gain
> For AV1/HEVC film grain, it is handle similar to inline converters and scalers.

I know a few decoders in the market didn't implement such feature in the 
its hardware, they rely on the other hardware.

Actually, it would be better to let NPU do such job.

> The driver secretly allocate the reference frames, and post process into the
> user visible buffers.
Hiding internal buffer is the worst case, frame buffer could be large.
> It breaks some assumption made by most protected memory
> setup though, as not all allocation is user driven, meaning the decoder needs to
> know if its secure or not. Secure memory is a also another API design space to
> tackle.
>
>> 3. HDR with dynamic meta data to SDR
> True, but easy to design around the stateless model. I'm not worried for these.
The current stateless API won't support DMA buffer for the metadata.
>
>> The video things fix for m2m model could be just a little less complex
>> than ISP or camera pipeline. The only difference is just ISP won't have
>> multiple contexts running at the same time.
> I thought that having the kernel schedule ISP reprocessing jobs (which requires
> instances) would be nice. But this can only be solved after we have solved the
> N:N use cases of m2m (with multiple instances).
>
>> If we could design a model for the video encoder I think we could solve
>> the most of problems.
>> A video encoder would have:
>> 1. input graphics buffer
>> 2. reconstruction graphics buffer
>> 3. motion vector cache buffer(optional)
>> 4. coded bitstream output
>> 5. encoding statistic report
>>>> I think we're strongly in need of a stateful video decoder framework that
>>>> would actually address the exact problems that those have rather than
>>>> bending something that wasn't designed with them in mind to work around the
>>>> differences.
>>> The bend is already there, of course I'd be happy to help with any new
>>> framework. Specially on modern stateless, were there is a need to do better
>>> scheduling.
>> I didn't know the schedule problem about stateless codec, are they
>> supposed to be in the job queue when the buffers that DPB requests are
>> own by the driver and its registers are prepared except the trigger bit?
> On RK3588 at least, decoder scheduling is going to be complex. There is an even
> number of cores, but when you need to decode 8K, you have to pair two cores
> (there is a specific set of cores that are to be paired with). We need a decent

How do two cores work parallel? Tiles ?

But AV1 could do intra block copy.

> scheduling logic to ensure we don't starve 8K decoding session when there is
> multiple smaller resolution session on-going.
>
> On MTK, the entropy decoding (LAT) and the reconstruction (CORE) is split. MTK
> vcodec is using multiple workqueues to move jobs around, which is clearly
> expensive. Also, the life time of a job is not exactly easy to manage.

This model sounds easy,

LAT produces partial frame buffer with intra blocks and its motion 
vector buffer

CORE complete the frame from the motion vector buffer and its reference 
buffers

We just separately two hardware devices here.

>
> On RPi HEVC (not upstream yet, but being worked on), the entropy decoding and
> reconstruction is done one the same core, but remains 2 concurrent operation.
> Does not impose a complex scheduling issue, but it raised the need for a way to
> fully utilize such HW.

This sounds be more complex than MTK's case. It would be hard to measure 
the job length with entropy part and inter construction part.

Although usually the later one would consume more memory bandwidth or 
hardware time.

>
> This is just some examples of complexity for which the current framework is not
> that helpful (even though, its not impossible either).
>
>>    Just ping me if you have some effort starting, I don't currently
>>> have a budget or bandwidth to write new drivers or port existing drivers them on
>>> a newly written framework.
>>>
>>> Nicolas
>>>
>>>
>>> [...]
diff mbox series

Patch

diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
index 0cc30397fbad..c771aba42015 100644
--- a/drivers/media/v4l2-core/v4l2-mem2mem.c
+++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
@@ -301,8 +301,9 @@  static void __v4l2_m2m_try_queue(struct v4l2_m2m_dev *m2m_dev,
 
 	dprintk("Trying to schedule a job for m2m_ctx: %p\n", m2m_ctx);
 
-	if (!m2m_ctx->out_q_ctx.q.streaming
-	    || !m2m_ctx->cap_q_ctx.q.streaming) {
+	if (!(m2m_ctx->out_q_ctx.q.streaming || m2m_ctx->out_q_ctx.buffered)
+	    || !(m2m_ctx->cap_q_ctx.q.streaming
+		 || m2m_ctx->cap_q_ctx.buffered)) {
 		dprintk("Streaming needs to be on for both queues\n");
 		return;
 	}