mbox series

[v2,0/6] CODA timeout fix & assorted changes

Message ID 20201007103544.22807-1-ezequiel@collabora.com
Headers show
Series CODA timeout fix & assorted changes | expand

Message

Ezequiel Garcia Oct. 7, 2020, 10:35 a.m. UTC
Hi all,

The main motivation for this series is to address a PIC_RUN
timeout, which we managed to link with a hardware bitstream
buffer underrun condition.

Upon further investigation we discovered that the underrun
was produced by a subtle issue in the way buffer_meta's were
being tracked.

The issue is fixed by patch "5/6 coda: coda_buffer_meta housekeeping fix".

Patches 1 to 4 are mostly cleanups and minor cosmetic changes.

Finally, while testing with corrupted bitstreams we realized
the driver was logging too verbosely, so patch 6 addresses
this by introducing a private control to read an macroblock-error
counter.

These patches have been tested against media's upstream
and v5.4-based, on i.MX6 (Wandboard) with H.264 and MPEG-2
streams.

In particular, this series manages to fix the PIC_RUN
timeout reported by Benjamin Bara [1].

I believe this timeout can be systematically reproduced with
a video containing small black frames, which can be generated with:

gst-launch-1.0 videotestsrc pattern=black num-buffers=300 ! \
video/x-raw,format=I420,width=176,height=120 ! avenc_mpeg2video ! \
mpegvideoparse ! mpegtsmux ! filesink location=black-qcif-10s.ts

Reviews and feedback are appreciated, as always.

[1] https://lkml.org/lkml/2020/8/21/495

Changelog
---------

v2:
* Keep the error MB message, but move it to coda_dbg(1, ctx).
* Add per-device rate limitting for the error MB message.
* Rename V4L2_CID_CODA_ERR_MB description.
* s/__coda_decoder_drop_used_metas/coda_decoder_drop_used_metas

Ezequiel Garcia (6):
  coda: Remove redundant ctx->initialized setting
  coda: Simplify H.264 small buffer padding logic
  coda: Clarify device registered log
  coda: Clarify interrupt registered name
  coda: coda_buffer_meta housekeeping fix
  coda: Add a V4L2 user for control error macroblocks count

 MAINTAINERS                               |  1 +
 drivers/media/platform/coda/coda-bit.c    | 72 ++++++++++++++++-------
 drivers/media/platform/coda/coda-common.c | 57 +++++++++++++++---
 drivers/media/platform/coda/coda.h        |  5 ++
 include/media/drv-intf/coda.h             | 13 ++++
 include/uapi/linux/v4l2-controls.h        |  4 ++
 6 files changed, 122 insertions(+), 30 deletions(-)
 create mode 100644 include/media/drv-intf/coda.h

Comments

Fabio Estevam Oct. 7, 2020, 11:13 a.m. UTC | #1
Hi Ezequiel,

On Wed, Oct 7, 2020 at 8:01 AM Ezequiel Garcia <ezequiel@collabora.com> wrote:
>

> Hi all,

>

> The main motivation for this series is to address a PIC_RUN

> timeout, which we managed to link with a hardware bitstream

> buffer underrun condition.

>

> Upon further investigation we discovered that the underrun

> was produced by a subtle issue in the way buffer_meta's were

> being tracked.

>

> The issue is fixed by patch "5/6 coda: coda_buffer_meta housekeeping fix".

>

> Patches 1 to 4 are mostly cleanups and minor cosmetic changes.


Shouldn't the fix be the first patch in the series so that it can be
backported to stable?

Thanks
Ezequiel Garcia Oct. 7, 2020, 11:34 a.m. UTC | #2
Hi Fabio,

On Wed, 2020-10-07 at 08:13 -0300, Fabio Estevam wrote:
> Hi Ezequiel,

> 

> On Wed, Oct 7, 2020 at 8:01 AM Ezequiel Garcia <ezequiel@collabora.com> wrote:

> > Hi all,

> > 

> > The main motivation for this series is to address a PIC_RUN

> > timeout, which we managed to link with a hardware bitstream

> > buffer underrun condition.

> > 

> > Upon further investigation we discovered that the underrun

> > was produced by a subtle issue in the way buffer_meta's were

> > being tracked.

> > 

> > The issue is fixed by patch "5/6 coda: coda_buffer_meta housekeeping fix".

> > 

> > Patches 1 to 4 are mostly cleanups and minor cosmetic changes.

> 

> Shouldn't the fix be the first patch in the series so that it can be

> backported to stable?

> 


While that is typically the case, it won't change much.

You'll find that the fix in patch 5 can be applied cleanly
on top of v5.4 and v5.8 :-)

Now that you mention this, I believe that the patch
should carry:

Cc: stable@vger.kernel.org # v5.4

Thanks,
Ezequiel
Fabio Estevam Oct. 12, 2020, 1:06 p.m. UTC | #3
On Wed, Oct 7, 2020 at 8:34 AM Ezequiel Garcia <ezequiel@collabora.com> wrote:

> While that is typically the case, it won't change much.

>

> You'll find that the fix in patch 5 can be applied cleanly

> on top of v5.4 and v5.8 :-)

>

> Now that you mention this, I believe that the patch

> should carry:

>

> Cc: stable@vger.kernel.org # v5.4


And a Fixes tag too :-)
Benjamin Bara - SKIDATA Oct. 12, 2020, 2:47 p.m. UTC | #4
Hi Ezequiel!

I applied your series on top of bbf5c979011a099af5dc76498918ed7df445635b (5.9 tag).
Additionally, I applied the attached patch to print the active metas during coda_fill_bitstream():
0001-DEBUG-coda-print-list-of-current-active-metas.patch

For verification, I used the attached single-color video:
MPEG4, 512x512, ~135 kb/s, 20 FPS, GoP = 5; see below for reproduction steps.
This might be a very "hard" and "theoretical" sample.
However, with this video, I still get a timeout (see error log below).

When applying my "simplistic approach" (second patch) on top of that, the video is working.
I saw a padding to 512 byte in the H.264 implementation, therefore I assumed it's the same here.
Since I don't know how to pad the MPEG4 stream, I decided to keep adding frames
until the "requirement" (I have no documentation for that) is reached.
Therefore, this patch ensures that there are at least two buffers queued
which reach the 512 byte threshold (for simplistic reasons 3, since "beyond the current one").

Note that the second patch is also just a proof of concept, there might be more efficient solutions.

If your reproduction attempts lead to different results, please write me an email.

Regards & many thanks!
Benjamin


*Video creation:*
1.) Create single color video:
gst-launch-1.0 videotestsrc num-buffers=100 pattern=blue ! \
video/x-raw,format=I420,width=512,height=512 ! v4l2mpeg4enc output-io-mode=4 ! \
avimux ! filesink location=temp.avi

2.) Re-encode with a GoP (IIRC this option is not provided by v4l2mpeg4enc,
even though enabled in the driver; I usually do this on my test pc, not the SuT):
ffmpeg -i temp.avi -vf scale=512:512 -vtag "xvid" -g 5 mpeg4.avi

*Error log:*
[ 1067.150368] coda 2040000.vpu: 0: start streaming vid-cap
[ 1067.164871] coda 2040000.vpu: 0: not ready: need 2 buffers available (queue:1 + bitstream:0)
[ 1067.165490] coda 2040000.vpu: 0: job ready
[ 1067.166682] coda 2040000.vpu: 0: active metas:
[ 1067.166692] coda 2040000.vpu: 0: - payload: 60
[ 1067.166697] coda 2040000.vpu: 0: - payload: 2913
[ 1067.166702] coda 2040000.vpu: 0: - payload: 155
[ 1067.166706] coda 2040000.vpu: 0: - payload: 155
[ 1067.166711] coda 2040000.vpu: 0: - payload: 155
[ 1067.166715] coda 2040000.vpu: 0: want to queue: payload: 155
[ 1067.166973] coda 2040000.vpu: 0: active metas:
[ 1067.166982] coda 2040000.vpu: 0: - payload: 60
[ 1067.166987] coda 2040000.vpu: 0: - payload: 2913
[ 1067.166992] coda 2040000.vpu: 0: - payload: 155
[ 1067.166996] coda 2040000.vpu: 0: - payload: 155
[ 1067.167000] coda 2040000.vpu: 0: - payload: 155
[ 1067.167005] coda 2040000.vpu: 0: want to queue: payload: 155
[ 1068.168449] coda 2040000.vpu: CODA PIC_RUN timeout
Ezequiel Garcia Nov. 4, 2020, 5 p.m. UTC | #5
Hi Benjamin,

On Mon, 2020-10-12 at 14:47 +0000, Benjamin Bara - SKIDATA wrote:
> Hi Ezequiel!
> 
> I applied your series on top of bbf5c979011a099af5dc76498918ed7df445635b (5.9 tag).
> Additionally, I applied the attached patch to print the active metas during coda_fill_bitstream():
> 0001-DEBUG-coda-print-list-of-current-active-metas.patch
> 
> For verification, I used the attached single-color video:
> MPEG4, 512x512, ~135 kb/s, 20 FPS, GoP = 5; see below for reproduction steps.
> This might be a very "hard" and "theoretical" sample.
> However, with this video, I still get a timeout (see error log below).
> 
> When applying my "simplistic approach" (second patch) on top of that, the video is working.
> I saw a padding to 512 byte in the H.264 implementation, therefore I assumed it's the same here.
> Since I don't know how to pad the MPEG4 stream, I decided to keep adding frames
> until the "requirement" (I have no documentation for that) is reached.
> Therefore, this patch ensures that there are at least two buffers queued
> which reach the 512 byte threshold (for simplistic reasons 3, since "beyond the current one").
> 
> Note that the second patch is also just a proof of concept, there might be more efficient solutions.
> 
> If your reproduction attempts lead to different results, please write me an email.
> 
> Regards & many thanks!
> Benjamin
> 
> 
> *Video creation:*
> 1.) Create single color video:
> gst-launch-1.0 videotestsrc num-buffers=100 pattern=blue ! \
> video/x-raw,format=I420,width=512,height=512 ! v4l2mpeg4enc output-io-mode=4 ! \
> avimux ! filesink location=temp.avi
> 
> 2.) Re-encode with a GoP (IIRC this option is not provided by v4l2mpeg4enc,
> even though enabled in the driver; I usually do this on my test pc, not the SuT):
> ffmpeg -i temp.avi -vf scale=512:512 -vtag "xvid" -g 5 mpeg4.avi
> 
> *Error log:*
> [ 1067.150368] coda 2040000.vpu: 0: start streaming vid-cap
> [ 1067.164871] coda 2040000.vpu: 0: not ready: need 2 buffers available (queue:1 + bitstream:0)
> [ 1067.165490] coda 2040000.vpu: 0: job ready
> [ 1067.166682] coda 2040000.vpu: 0: active metas:
> [ 1067.166692] coda 2040000.vpu: 0: - payload: 60
> [ 1067.166697] coda 2040000.vpu: 0: - payload: 2913
> [ 1067.166702] coda 2040000.vpu: 0: - payload: 155
> [ 1067.166706] coda 2040000.vpu: 0: - payload: 155
> [ 1067.166711] coda 2040000.vpu: 0: - payload: 155
> [ 1067.166715] coda 2040000.vpu: 0: want to queue: payload: 155
> [ 1067.166973] coda 2040000.vpu: 0: active metas:
> [ 1067.166982] coda 2040000.vpu: 0: - payload: 60
> [ 1067.166987] coda 2040000.vpu: 0: - payload: 2913
> [ 1067.166992] coda 2040000.vpu: 0: - payload: 155
> [ 1067.166996] coda 2040000.vpu: 0: - payload: 155
> [ 1067.167000] coda 2040000.vpu: 0: - payload: 155
> [ 1067.167005] coda 2040000.vpu: 0: want to queue: payload: 155
> [ 1068.168449] coda 2040000.vpu: CODA PIC_RUN timeout
> 

Many thanks for your report. Indeed you managed to create
a video that is quite problematic for CODA.

Adding some debugging code to dump the metas, and then 
inspect the bitstream payload before/after a sync,
we can see that the hardware buffer has 690 bytes.

This seems a bit confusing, since the driver considers
it's enough. From the log below, isn't this supposed
to be enough metas?

[  274.087643] coda 2040000.vpu: CODA PIC_RUN timeout
[  274.092655] coda 2040000.vpu: 	meta 2: 432 - 804 (00 00 01 b0 01)
[  274.099133] coda 2040000.vpu: 	meta 3: 804 - 1176 (00 00 01 b0 01)
[  274.105551] coda 2040000.vpu: 	meta 4: 1176 - 1548 (00 00 01 b0 01)
[  274.112200] coda 2040000.vpu: 	meta 5: 1548 - 1920 (00 00 01 b0 01)
[  274.118840] coda 2040000.vpu: 	meta 6: 1920 - 2292 (00 00 01 b0 01)
[  274.125347] coda 2040000.vpu: bitstream payload: 690 (before sync)
[  274.130627] coda 2040000.vpu: bitstream payload: 690 (after sync)

Philipp, what do you think?

I must admit I can't fully wrap my head around
your prefetch fix, and how the new condition
would fix this issue. Could you explain it in more detail?

Why wouldn't the prefetcher count chunks smaller than 512?
Do you have some documentation for that?

Thanks,
Ezequiel
Benjamin Bara - SKIDATA Nov. 4, 2020, 5:49 p.m. UTC | #6
> -----Original Message-----

> From: Ezequiel Garcia <ezequiel@collabora.com>

> Sent: Mittwoch, 4. November 2020 18:01


Hi again!
 
> Many thanks for your report. Indeed you managed to create a video that is quite

> problematic for CODA.


Thanks for looking at it!

> I must admit I can't fully wrap my head around your prefetch fix, and how the new

> condition would fix this issue. Could you explain it in more detail?


I don't have access to the CODA960 documentation, so all my findings are based on tests
or code documentation.

My starting point was the inline documentation [1]:
"If we managed to fill in at least a full reorder window of buffers and the bitstream
 prefetcher has at least 2 256 bytes periods beyond the first buffer to fetch (...)".

The current code checks if there are 512 bytes after the current meta.
When I comment out the following break, my test videos are working.
So for me, starvation is caused by this statement, because it hinders follow-up metas
for performance reasons (the decoder might also start hiccuping with too much data enqueued,
but not sure).

I did some tests, and basically I found out that the sum doesn't matter, so something like
- Meta 2: 1024
- Meta 3: 100
- Meta 4: 100
starves, even if it passes the current check.

Next, I thought about "2x 256" is actually not the same as "1x 512".
With 2x 256, I could actually got a couple more test videos running, something like:
- Meta 2: 650
- Meta 3: 50
- Meta 4: 650

The crafting of such video is quite easy by varying the Group-of-Picture and the resolution.

However, then I tried some smaller videos (with meta 2 & 4: 256 < meta < 512) 
which lead to starvation again.
I haven't done additional tests, instead I shared my findings and asked for further documentation
hints about the actual starvation criteria.

So my solution is just a Proof-of-Concept, found by testing and is not backed by documentation - 
therefore I am also not sure if it is sufficient or has any side effects.

Best regards
Benjamin 

[1] https://elixir.bootlin.com/linux/v5.10-rc2/source/drivers/media/platform/coda/coda-bit.c#L352
Ezequiel Garcia Nov. 26, 2020, 11:41 p.m. UTC | #7
Hi Benjamin,

On Wed, 2020-11-04 at 17:49 +0000, Benjamin Bara - SKIDATA wrote:
> > -----Original Message-----

> > From: Ezequiel Garcia <ezequiel@collabora.com>

> > Sent: Mittwoch, 4. November 2020 18:01

> 

> Hi again!

>  

> > Many thanks for your report. Indeed you managed to create a video that is quite

> > problematic for CODA.

> 

> Thanks for looking at it!

> 

> > I must admit I can't fully wrap my head around your prefetch fix, and how the new

> > condition would fix this issue. Could you explain it in more detail?

> 

> I don't have access to the CODA960 documentation, so all my findings are based on tests

> or code documentation.

> 

> My starting point was the inline documentation [1]:

> "If we managed to fill in at least a full reorder window of buffers and the bitstream

>  prefetcher has at least 2 256 bytes periods beyond the first buffer to fetch (...)".

> 

> The current code checks if there are 512 bytes after the current meta.

> When I comment out the following break, my test videos are working.

> So for me, starvation is caused by this statement, because it hinders follow-up metas

> for performance reasons (the decoder might also start hiccuping with too much data enqueued,

> but not sure).

> 

> I did some tests, and basically I found out that the sum doesn't matter, so something like

> - Meta 2: 1024

> - Meta 3: 100

> - Meta 4: 100

> starves, even if it passes the current check.

> 

> Next, I thought about "2x 256" is actually not the same as "1x 512".

> With 2x 256, I could actually got a couple more test videos running, something like:

> - Meta 2: 650

> - Meta 3: 50

> - Meta 4: 650

> 

> The crafting of such video is quite easy by varying the Group-of-Picture and the resolution.

> 

> However, then I tried some smaller videos (with meta 2 & 4: 256 < meta < 512) 

> which lead to starvation again.

> I haven't done additional tests, instead I shared my findings and asked for further documentation

> hints about the actual starvation criteria.

> 

> So my solution is just a Proof-of-Concept, found by testing and is not backed by documentation - 

> therefore I am also not sure if it is sufficient or has any side effects.

> 


Please submit your patch on top of latest media master branch and let's
start discussing it.

BTW, Nicolas and I have started using https://github.com/fluendo/fluster/
conformance testing. Perhaps you can run the H264 test suite, with & without
your patch (assuming it affects H264 in any way) and include that information
in the cover letter.

Thanks!
Ezequiel