lpms Count pixels in segment without decoding video data

Is your feature request related to a problem? Please describe. Current API for counting pixels in segments does actual video stream decoding. This API is used by broadcaster, and will make it to be CPU constrained instead of being memory constrained.

Describe the solution you'd like Each video frame could potentially have a different resolution, so we can't just multiply video length by frame rate. But we could potentially parse bitstream, extracting needed information but not decoding video itself. Looks like FFmpeg has an internal API for this but it isn't exposed for library consumers, so FFmpeg should be forked to expose this API.

Describe alternatives you've considered Write own bitstream parser, but this is not a good option.

Aug 21 '19 14:08 darkdarkdragon

Another possible solution - joy4? It already has h264 parsing code, maybe it will be enough for our task? @j0sh what do you think?

Aug 21 '19 14:08 darkdarkdragon

@darkdarkdragon Good thinking on joy4. From a quick glance over the code here and here, joy4 would probably give us the latest width and height out of the box. We'd have to do some work the transform the results of SplitNALUs into the information we need (eg, take each SPS/PPS, count the frames/slices after each, and so forth).

For the goclient, we might end up needing to do a full decode anyway for other purposes (eg, verification) so the current approach might end up being OK for our needs.

Once we have a good solution to https://github.com/livepeer/lpms/issues/139 then we can probably skip this pixel-counting step for most cases, because the B can then make a decent estimation of the number of pixels to expect in the result. If the estimate happens to be "off", maybe due to fluctuations in the source frame rate, then the B could fall back to a full decode + verification step.

Aug 21 '19 18:08 j0sh

@j0sh I'm not familiar with low-lever h264 protocol. So, width/height information is contained only in SPS/PPS NALUs, not inside VCL packets? After your comment in decode PR I thought that every frame can have own resolution? If resolution information is contained solely in SPS/PPS packets, then it looks like we could just easily rewrite segmenter with joy4 and derive every information we needed in it.

For the goclient, we might end up needing to do a full decode anyway for other purposes (eg, verification)

That would depend on how verification is implemented, plus it will be in different part of data flow.

Aug 21 '19 18:08 darkdarkdragon

After your comment in decode PR I thought that every frame can have own resolution?

Yes - by putting a new SPS in front of each frame.

rewrite segmenter with joy4 and derive every information we needed in it.

That's something else entirely :) What makes me nervous here is that joy4 is unmaintained and its interoperability story isn't the best - eg, we have rtmp issues with it. ffmpeg is actively maintained, has been battle-tested with a huge variety of content, has a lots of knobs that we can utilize, and so forth. Rewriting the segmenter in joy4 means taking all that maintenance burden on ourselves, which is something we can't really afford to do right now.

Aug 21 '19 23:08 j0sh

Yes - by putting a new SPS in front of each frame.

OK, so current joy4 can be used to count pixels for whole segment. Also, for current RTMP input, we already passing all the packets through joy4, and we could count pixels at that point (but this has problem with synchronizing this data with segments produced by FFmpeg).

Rewriting the segmenter in joy4 means taking all that maintenance burden on ourselves,

I agree here, but we periodically want to do changes to FFmpeg's segmenter (like count pixels in it, get resolution of stream) and it can only be done by maintaining own fork, which is also not good. Plus whole scheme with re-sending RTMP stream to FFmpeg and polling filesystem for segments looks very inefficient, and also it always looses some number of frames at start of the stream. With all that, probably we have reasons to maintain own joy4 segmenter, somewhere in the future.

Aug 22 '19 22:08 darkdarkdragon

Not sure if it is still important, but a small comment here: Putting SPS "in front" of each frame does nothing. It is typical for the stream to contain SPS, PPS and then IDR picture, but it doesn't have to be so. In fact, standard even permits transmitting SPS and PPS "out of bounds".

The way things work is, every slice header "invokes" PPS by including PPS id field (and standard says every slice header of the same frame has to "invoke" same PPS). Then in turn, every PPS "invokes" SPS (also by containing SPS ID). Standard also says "activated sequence parameter set RBSP shall remain active for the entire coded video sequence". And as video sequence begins with IDR picture, it is more like "IDR picture is one and only chance to change stream resolution".

One can also see this behaviour when programming low-level hardware decoders. It is customary for them to check for resolution change before every IDR frame, and if change is about to occur, they usually close the decoding, free away any frame buffers, then allocate a set of new ones before reopening with a new resolution.

Mar 23 '22 09:03 MikeIndiaAlpha