Audio-video desync for a three-hour long video
:warning: Please read this carefully! If you do not fill out this information, your bug report may be closed.
Expected behavior ffmpeg-normalize should produce an output without desynced audio and video.
Actual behavior For a long video, ffmpeg-normalize produces an output with desynced audio and video. The audio is ahead of the video, meaning that a sound is heard first, before its corresponding action is shown in video. The magnitude of the desync increases with video time.
Command The exact command you were trying to run:
ffmpeg-normalize --debug input.mp4 -c:a aac -o output.mp4
Any output you get when running the command with the --debug flag:
DEBUG: found executable in path: /home/ning/miniconda3/envs/armada/bin/ffmpeg
DEBUG: found executable in path: /home/ning/miniconda3/envs/armada/bin/ffmpeg
DEBUG: Running command: ['/home/ning/miniconda3/envs/armada/bin/ffmpeg', '-filters']
DEBUG: Parsing streams of input.mp4
DEBUG: Running command: ['/home/ning/miniconda3/envs/armada/bin/ffmpeg', '-i', 'input.mp4', '-c', 'copy', '-t', '0', '-map', '0', '-f', 'null', '/dev/null']
DEBUG: Stream parsing command output:
DEBUG: ffmpeg version 5.0 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 10.3.0 (GCC)
configuration: --prefix=/home/conda/feedstock_root/build_artifacts/ffmpeg_1646229198505/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac --cc=/home/conda/feedstock_root/build_artifacts/ffmpeg_1646229198505/_build_env/bin/x86_64-conda-linux-gnu-cc --disable-doc --disable-openssl
--enable-demuxer=dash --enable-gnutls --enable-gpl --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-vaapi --enable-libx264 --enable-libx265 --enable-libaom --enable-libsvtav1 --enable-libxml2 --enable-libvpx --enable-pic --enable-pthreads --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libmp3lame --pkg-config=/home/conda/feedstock_root/build_artifacts/ffmpeg_1646229198505/_build_env/bin/pkg-config
libavutil 57. 17.100 / 57. 17.100
libavcodec 59. 18.100 / 59. 18.100
libavformat 59. 16.100 / 59. 16.100
libavdevice 59. 4.100 / 59. 4.100
libavfilter 8. 24.100 / 8. 24.100
libswscale 6. 4.100 / 6. 4.100
libswresample 4. 3.100 / 4. 3.100
libpostproc 56. 3.100 / 56. 3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf59.16.100
Duration: 03:12:16.80, start: 0.000000, bitrate: 3074 kb/s
Stream #0:0[0x1](und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 130 kb/s (default)
Metadata:
handler_name : SoundHandler
vendor_id : [0][0][0][0]
Stream #0:1[0x2](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, unknown/bt470bg/unknown, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 2928 kb/s, 60 fps, 60 tbr, 15360 tbn (default)
Metadata:
handler_name : VideoHandler
vendor_id : [0][0][0][0]
Output #0, null, to '/dev/null':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf59.16.100
Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 130 kb/s (default)
Metadata:
handler_name : SoundHandler
vendor_id : [0][0][0][0]
Stream #0:1(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, unknown/bt470bg/unknown, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 2928 kb/s, 60 fps, 60 tbr, 15360 tbn (default)
Metadata:
handler_name : VideoHandler
vendor_id : [0][0][0][0]
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Stream #0:1 -> #0:1 (copy)
Press [q] to stop, [?] for help
frame= 2 fps=0.0 q=-1.0 Lsize=N/A time=00:00:00.00 bitrate=N/A speed= 0x
video:96kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
DEBUG: Found duration: 11536.08 s
DEBUG: Found audio stream at index 0
DEBUG: Found video stream at index 1
INFO: Normalizing file input.mp4 (1 of 1)
DEBUG: Running normalization for input.mp4
DEBUG: Parsing normalization info for input.mp4
INFO: Running first pass loudnorm filter for stream 0
DEBUG: Running command: ['/home/ning/miniconda3/envs/armada/bin/ffmpeg', '-nostdin', '-y', '-i', 'input.mp4', '-filter_complex', '[0:0]loudnorm=i=-23.0:lra=7.0:tp=-2.0:offset=0.0:print_format=json', '-vn', '-sn', '-f', 'null', '/dev/null']
(Missing output in the middle, here, because the amount of output surpassed my terminal emulator's max lines.)
bitrate=N/A
total_size=N/A
out_time_us=11523300000
out_time_ms=11523300000
out_time=03:12:03.300000
dup_frames=0
drop_frames=0
speed=31.5x
progress=continue
size=N/A time=03:12:12.88 bitrate=N/A speed=31.5x
bitrate=N/A
total_size=N/A
out_time_us=11532887078
out_time_ms=11532887078
out_time=03:12:12.887078
dup_frames=0
drop_frames=0
speed=31.5x
progress=end
video:0kB audio:8649665kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_loudnorm_0 @ 0x5623ceabf400]
{
"input_i" : "-16.92",
"input_tp" : "2.72",
"input_lra" : "17.70",
"input_thresh" : "-28.76",
"output_i" : "-23.33",
"output_tp" : "-2.00",
"output_lra" : "11.20",
"output_thresh" : "-34.57",
"normalization_type" : "dynamic",
"target_offset" : "0.33"
}
DEBUG: Loudnorm stats parsed: {"input_i": "-16.92", "input_tp": "2.72", "input_lra": "17.70", "input_thresh": "-28.76", "output_i": "-23.33", "output_tp": "-2.00", "output_lra": "11.20", "output_thresh": "-34.57", "normalization_type": "dynamic", "target_offset": "0.33"}
INFO: Running second pass for input.mp4
WARNING: The sample rate will automatically be set to 192 kHz by the loudnorm filter. Specify -ar/--sample-rate to override it.
DEBUG: Running command: ['/home/ning/miniconda3/envs/armada/bin/ffmpeg', '-y', '-nostdin', '-i', 'input.mp4', '-filter_complex', '[0:0]loudnorm=i=-23.0:lra=7.0:tp=-2.0:offset=0.33:measured_i=-16.92:measured_lra=17.7:measured_tp=2.72:measured_thresh=-28.76:linear=true:print_format=json[norm0]', '-map_metadata', '0', '-map_metadata:s:a:0', '0:s:a:0', '-map_metadata:s:v:0', '0:s:v:0', '-map_chapters', '0', '-map', '0:1', '-c:v', 'copy', '-map', '[norm0]', '-c:a', 'aac', '-c:s', 'copy', '/tmp/ble6nigo.mp4']
(Again, omitting output that had been lost to my terminal's max lines.)
frame=692116
fps=858.24
stream_0_0_q=-1.0
bitrate=3058.5kbits/s
total_size=4410048560
out_time_us=11535216732
out_time_ms=11535216732
out_time=03:12:15.216732
dup_frames=0
drop_frames=0
speed=14.3x
progress=continue
frame=692207 fps=858 q=-1.0 Lsize= 4337200kB time=03:12:16.73 bitrate=3079.8kbits/s speed=14.3x
frame=692207
fps=857.92
stream_0_0_q=-1.0
bitrate=3079.8kbits/s
total_size=4441292419
out_time_us=11536733398
out_time_ms=11536733398
out_time=03:12:16.733398
dup_frames=0
drop_frames=0
speed=14.3x
progress=end
video:4123902kB audio:183378kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.694622%
[Parsed_loudnorm_0 @ 0x558920476b40]
{
"input_i" : "-16.92",
"input_tp" : "2.72",
"input_lra" : "17.70",
"input_thresh" : "-28.76",
"output_i" : "-22.99",
"output_tp" : "-2.00",
"output_lra" : "11.10",
"output_thresh" : "-34.24",
"normalization_type" : "dynamic",
"target_offset" : "-0.01"
}
[aac @ 0x5589203227c0] Qavg: 771.614
DEBUG: Moving temporary file from /tmp/ble6nigo.mp4 to output.mp4
DEBUG: Normalization finished
INFO: Normalized file written to output.mp4
Environment (please complete the following information):
- [x] Your operating system
- [x] Your Python version / distribution (
python --version) - [x] Your ffmpeg version (
ffmpeg -version)
$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux bookworm/sid
Release: testing
Codename: bookworm
$ python --version
Python 3.10.2
$ ffmpeg -version
ffmpeg version 5.0 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 10.3.0 (GCC)
configuration: --prefix=/home/conda/feedstock_root/build_artifacts/ffmpeg_1646229198505/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac --cc=/home/conda/feedstock_root/build_artifacts/ffmpeg_1646229198505/_build_env/bin/x86_64-conda-linux-gnu-cc --disable-doc --disable-openssl --enable-demuxer=dash --enable-gnutls --enable-gpl --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-vaapi --enable-libx264 --enable-libx265 --enable-libaom --enable-libsvtav1 --enable-libxml2 --enable-libvpx --enable-pic --enable-pthreads --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libmp3lame --pkg-config=/home/conda/feedstock_root/build_artifacts/ffmpeg_1646229198505/_build_env/bin/pkg-config
libavutil 57. 17.100 / 57. 17.100
libavcodec 59. 18.100 / 59. 18.100
libavformat 59. 16.100 / 59. 16.100
libavdevice 59. 4.100 / 59. 4.100
libavfilter 8. 24.100 / 8. 24.100
libswscale 6. 4.100 / 6. 4.100
libswresample 4. 3.100 / 4. 3.100
libpostproc 56. 3.100 / 56. 3.100
Thanks for providing a detailed report. With these kinds of errors, there probably isn't much ffmpeg-normalize can do about it, since it is just calling ffmpeg to do the work.
Just to make sure: the input itself has no drifting timestamps?
Does the following reproduce the error?
ffmpeg -i input.mp4 -c:a aac -c:v copy output.mp4
Does the following also exhibit the same drift?
ffmpeg -i input.mp4 -c copy output.mp4
I'm not sure what a drifting timestamp is. The input.mp4 plays fine with audio/video in sync. Is there a way I can otherwise check for a drifting timestamp?
Both of the solutions you provided worked to produce an output without desync. I tried using the same arguments with ffmpeg-normalize but there were syntax error (you probably already saw this coming):
$ ffmpeg-normalize input.mp4 -c:a aac -c:v copy -o sol3.mp4
WARNING: The sample rate will automatically be set to 192 kHz by the loudnorm filter. Specify -ar/--sample-rate to override it.
usage: ffmpeg-normalize input [input ...]
[-h]
[-o OUTPUT [OUTPUT ...]] [-of OUTPUT_FOLDER]
[-f] [-d] [-v] [-q] [-n] [-pr]
[--version]
[-nt {ebu,rms,peak}] [-t TARGET_LEVEL] [-p]
[-lrt LOUDNESS_RANGE_TARGET] [-tp TRUE_PEAK] [--offset OFFSET] [--dual-mono]
[-c:a AUDIO_CODEC] [-b:a AUDIO_BITRATE] [-ar SAMPLE_RATE] [-koa]
[-prf PRE_FILTER] [-pof POST_FILTER]
[-vn] [-c:v VIDEO_CODEC]
[-sn] [-mn] [-cn]
[-ei EXTRA_INPUT_OPTIONS] [-e EXTRA_OUTPUT_OPTIONS]
[-ofmt OUTPUT_FORMAT]
[-ext EXTENSION]
ffmpeg-normalize: error: ambiguous option: -c could match -c:a, -c:v, -cn
$ ffmpeg-normalize input.mp4 -c copy -o sol4.mp4
usage: ffmpeg-normalize input [input ...]
[-h]
[-o OUTPUT [OUTPUT ...]] [-of OUTPUT_FOLDER]
[-f] [-d] [-v] [-q] [-n] [-pr]
[--version]
[-nt {ebu,rms,peak}] [-t TARGET_LEVEL] [-p]
[-lrt LOUDNESS_RANGE_TARGET] [-tp TRUE_PEAK] [--offset OFFSET] [--dual-mono]
[-c:a AUDIO_CODEC] [-b:a AUDIO_BITRATE] [-ar SAMPLE_RATE] [-koa]
[-prf PRE_FILTER] [-pof POST_FILTER]
[-vn] [-c:v VIDEO_CODEC]
[-sn] [-mn] [-cn]
[-ei EXTRA_INPUT_OPTIONS] [-e EXTRA_OUTPUT_OPTIONS]
[-ofmt OUTPUT_FORMAT]
[-ext EXTENSION]
ffmpeg-normalize: error: ambiguous option: -c could match -c:a, -c:v, -cn
Drifting timestamps means that there will be an offset between audio and video that gets larger and larger.
What happens when you do:
ffmpeg -i input.mp4 -filter_complex "[0:a]loudnorm[a0]" -map "[a0]" -map 0:v -c:a aac -c:v copy output.mp4
Drifting timestamps means that there will be an offset between audio and video that gets larger and larger.
What happens when you do:
ffmpeg -i input.mp4 -filter_complex "[0:a]loudnorm[a0]" -map "[a0]" -map 0:v -c:a aac -c:v copy output.mp4
This one gives drifting timestamps in output.mp4
In that case this is an upstream bug in ffmpeg about which I cannot change much. It seems to be an issue with the loudnorm filter itself. Can you supply an input file with which the issue can be reproduced? In that case it would be good to post it on https://trac.ffmpeg.org/ as a bug report.
Possibly related: https://github.com/slhck/ffmpeg-normalize/issues/146#issuecomment-1047640861
Closing this for now, if it appears again let's reopen with a repro case.