PyAV icon indicating copy to clipboard operation
PyAV copied to clipboard

Re-encoding with variable frame rate

Open benedikt-grl opened this issue 7 months ago • 2 comments

First of all, thank you for maintaining PyAV. It is a very valuable tool.

I wanted to use PyAV to cut a video into shorter segments. The input video has a variable frame rate (mostly close to 30 fps, but sometimes also a lot smaller) and I would like to maintain the presentation timestamps. The issue is that I find it difficult to control the presentation timestamps in the resulting output videos.

To simplify the problem, I created a simple example that should decode an input video frame by frame and re-encode it with the original presentation timestamps.

input_path = "..."
output_path = "..."

# Open the input container
input_container  = av.open(input_path)
input_stream  = input_container.streams.video[0]

# Open the output container
output_container = av.open(output_path, "w")

# Create output stream and copy some options from the input stream
out_stream = output_container.add_stream("libx264")
out_stream.width = input_stream.width
out_stream.height = input_stream.height
out_stream.pix_fmt = "yuv420p"
out_stream.time_base   = input_stream.time_base
out_stream.options = {
    "crf": "0",
    "preset": "slow",
    "profile": "high444",
    "bf": "0",
    "colorprim": "bt709",
    "transfer": "bt709",
    "colormatrix": "bt709",
    "x264opts": "cabac=1",
}

# Iterate over the frames
for input_frame in input_container.decode(video=0):

    # Create a new frame with the pixel values of the input frame
    output_frame = av.VideoFrame.from_ndarray(input_frame.to_ndarray(), format=input_frame.format.name)

    # Copy pts from the input frame
    output_frame.pts = input_frame.pts

    # Encode and mux
    for packet in out_stream.encode(output_frame):
        output_container.mux(packet)

# Flush
for packet in out_stream.encode():
    output_container.mux(packet)

# Close containers
output_container.close()
input_container.close()

Let's compare the input and output video with ffprobe:

# Input video
Video: h264 (High 4:4:4 Predictive), yuv420p(tv, bt709, progressive), 1280x720 [SAR 1:1 DAR 16:9], 30 fps, 30 tbr, 1k tbn (default)

# Output video
Video: h264 (High 4:4:4 Predictive) (avc1 / 0x31637661), yuv420p(progressive), 1280x720, 327 kb/s, 0.73 fps, 24 tbr, 16k tbn (default)

The problem is that the frame rate of the output video is a lot smaller than the original video.

I tried several variants of the code above:

(1) Instead of out_stream.encode(output_frame), do out_stream.encode(input_frame). The program fails with a ValueError. I assume it could be related to some parameters set in the input_frame object that the encoder doesn't like, e.g., some DTS. To overcome this problem, I created a new frame object based on the input frame's data.

(2) Set the time scale of the output container, i.e., output_container = av.open(output_path, "w", container_options={"video_track_timescale": "1000"}).
ffprobe now shows:

Stream #0:0[0x1](und): Video: h264 (High 4:4:4 Predictive) (avc1 / 0x31637661), yuv420p(progressive), 1280x720, 327 kb/s, 0.73 fps, 24 tbr, 1k tbn (default)

The container timebase (tbn) now matches the input video, but this had no effect on the frame rate.

(3) Explicitly specify a rate, i.e., out_stream = output_container.add_stream("libx264", rate=30) ffprobe now shows:

Stream #0:0[0x1](und): Video: h264 (High 4:4:4 Predictive) (avc1 / 0x31637661), yuv420p(progressive), 1280x720, 408 kb/s, 0.91 fps, 30 tbr, 16k tbn (default)

The tbr changed from 24 to 30, but the frame rate is still below 1 fps.

(4) Explicitly set a packet time base before muxing, i.e.

# Encode and mux
for packet in out_stream.encode(output_frame):
    packet.time_base = input_stream.time_base
    output_container.mux(packet)

ffprobe now shows:

Stream #0:0[0x1](und): Video: h264 (High 4:4:4 Predictive) (avc1 / 0x31637661), yuv420p(progressive), 1280x720, 13631 kb/s, 30.18 fps, 30 tbr, 16k tbn (default)

This is very close to the desired output, but I am skeptical whether this is the correct solution because I have not seen any examples that set the packet time base. Also, the fps and the total duration do not perfectly match the input file.

Frankly, I am confused what arguments I have to set to copy the input video's presentation timestamps. It would be super helpful to have an example that explains the effect of the video_track_timescale, the optional rate argument in output_container.add_stream, as well as output_stream.time_base, output_frame.time_base, packet.time_base. Something seems to be rescaling the timestamps, and I would like to understand what it is.

Thanks for your help!

benedikt-grl avatar Jul 17 '25 10:07 benedikt-grl

I'm facing the exact same issue. I end up using some similar work around. Would be great to know how to do it properly.

atodniAr avatar Sep 05 '25 15:09 atodniAr

I'm also working on something similar, albeit with simply re-muxing packets after serializing them, and I'm finding similar issues. For me, I can only get an approximately correct framerate when I set the container video timescale; if I do not do that, the output video is garbled and broken.

harimohanraj avatar Nov 06 '25 19:11 harimohanraj