ffmpeg-normalize icon indicating copy to clipboard operation
ffmpeg-normalize copied to clipboard

Audio-video desync for a three-hour long video

Open ning-y opened this issue 4 years ago • 5 comments

:warning: Please read this carefully! If you do not fill out this information, your bug report may be closed.

Expected behavior ffmpeg-normalize should produce an output without desynced audio and video.

Actual behavior For a long video, ffmpeg-normalize produces an output with desynced audio and video. The audio is ahead of the video, meaning that a sound is heard first, before its corresponding action is shown in video. The magnitude of the desync increases with video time.

Command The exact command you were trying to run:

ffmpeg-normalize --debug input.mp4 -c:a aac -o output.mp4

Any output you get when running the command with the --debug flag:

DEBUG: found executable in path: /home/ning/miniconda3/envs/armada/bin/ffmpeg
DEBUG: found executable in path: /home/ning/miniconda3/envs/armada/bin/ffmpeg
DEBUG: Running command: ['/home/ning/miniconda3/envs/armada/bin/ffmpeg', '-filters']
DEBUG: Parsing streams of input.mp4
DEBUG: Running command: ['/home/ning/miniconda3/envs/armada/bin/ffmpeg', '-i', 'input.mp4', '-c', 'copy', '-t', '0', '-map', '0', '-f', 'null', '/dev/null']
DEBUG: Stream parsing command output:
DEBUG: ffmpeg version 5.0 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 10.3.0 (GCC)
  configuration: --prefix=/home/conda/feedstock_root/build_artifacts/ffmpeg_1646229198505/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac --cc=/home/conda/feedstock_root/build_artifacts/ffmpeg_1646229198505/_build_env/bin/x86_64-conda-linux-gnu-cc --disable-doc --disable-openssl
--enable-demuxer=dash --enable-gnutls --enable-gpl --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-vaapi --enable-libx264 --enable-libx265 --enable-libaom --enable-libsvtav1 --enable-libxml2 --enable-libvpx --enable-pic --enable-pthreads --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libmp3lame --pkg-config=/home/conda/feedstock_root/build_artifacts/ffmpeg_1646229198505/_build_env/bin/pkg-config
  libavutil      57. 17.100 / 57. 17.100
  libavcodec     59. 18.100 / 59. 18.100
  libavformat    59. 16.100 / 59. 16.100
  libavdevice    59.  4.100 / 59.  4.100
  libavfilter     8. 24.100 /  8. 24.100
  libswscale      6.  4.100 /  6.  4.100
  libswresample   4.  3.100 /  4.  3.100
  libpostproc    56.  3.100 / 56.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf59.16.100
  Duration: 03:12:16.80, start: 0.000000, bitrate: 3074 kb/s
  Stream #0:0[0x1](und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 130 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
  Stream #0:1[0x2](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, unknown/bt470bg/unknown, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 2928 kb/s, 60 fps, 60 tbr, 15360 tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
Output #0, null, to '/dev/null':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf59.16.100
  Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 130 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
  Stream #0:1(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, unknown/bt470bg/unknown, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 2928 kb/s, 60 fps, 60 tbr, 15360 tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (copy)
Press [q] to stop, [?] for help
frame=    2 fps=0.0 q=-1.0 Lsize=N/A time=00:00:00.00 bitrate=N/A speed=   0x
video:96kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

DEBUG: Found duration: 11536.08 s
DEBUG: Found audio stream at index 0
DEBUG: Found video stream at index 1
INFO: Normalizing file input.mp4 (1 of 1)
DEBUG: Running normalization for input.mp4
DEBUG: Parsing normalization info for input.mp4
INFO: Running first pass loudnorm filter for stream 0
DEBUG: Running command: ['/home/ning/miniconda3/envs/armada/bin/ffmpeg', '-nostdin', '-y', '-i', 'input.mp4', '-filter_complex', '[0:0]loudnorm=i=-23.0:lra=7.0:tp=-2.0:offset=0.0:print_format=json', '-vn', '-sn', '-f', 'null', '/dev/null']

(Missing output in the middle, here, because the amount of output surpassed my terminal emulator's max lines.)

bitrate=N/A
total_size=N/A
out_time_us=11523300000
out_time_ms=11523300000
out_time=03:12:03.300000
dup_frames=0
drop_frames=0
speed=31.5x
progress=continue
size=N/A time=03:12:12.88 bitrate=N/A speed=31.5x
bitrate=N/A
total_size=N/A
out_time_us=11532887078
out_time_ms=11532887078
out_time=03:12:12.887078
dup_frames=0
drop_frames=0
speed=31.5x
progress=end
video:0kB audio:8649665kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_loudnorm_0 @ 0x5623ceabf400]
{
"input_i" : "-16.92",
"input_tp" : "2.72",
"input_lra" : "17.70",
"input_thresh" : "-28.76",
"output_i" : "-23.33",
"output_tp" : "-2.00",
"output_lra" : "11.20",
"output_thresh" : "-34.57",
"normalization_type" : "dynamic",
"target_offset" : "0.33"
}
DEBUG: Loudnorm stats parsed: {"input_i": "-16.92", "input_tp": "2.72", "input_lra": "17.70", "input_thresh": "-28.76", "output_i": "-23.33", "output_tp": "-2.00", "output_lra": "11.20", "output_thresh": "-34.57", "normalization_type": "dynamic", "target_offset": "0.33"}
INFO: Running second pass for input.mp4
WARNING: The sample rate will automatically be set to 192 kHz by the loudnorm filter. Specify -ar/--sample-rate to override it.
DEBUG: Running command: ['/home/ning/miniconda3/envs/armada/bin/ffmpeg', '-y', '-nostdin', '-i', 'input.mp4', '-filter_complex', '[0:0]loudnorm=i=-23.0:lra=7.0:tp=-2.0:offset=0.33:measured_i=-16.92:measured_lra=17.7:measured_tp=2.72:measured_thresh=-28.76:linear=true:print_format=json[norm0]', '-map_metadata', '0', '-map_metadata:s:a:0', '0:s:a:0', '-map_metadata:s:v:0', '0:s:v:0', '-map_chapters', '0', '-map', '0:1', '-c:v', 'copy', '-map', '[norm0]', '-c:a', 'aac', '-c:s', 'copy', '/tmp/ble6nigo.mp4']

(Again, omitting output that had been lost to my terminal's max lines.)

frame=692116
fps=858.24
stream_0_0_q=-1.0
bitrate=3058.5kbits/s
total_size=4410048560
out_time_us=11535216732
out_time_ms=11535216732
out_time=03:12:15.216732
dup_frames=0
drop_frames=0
speed=14.3x
progress=continue
frame=692207 fps=858 q=-1.0 Lsize= 4337200kB time=03:12:16.73 bitrate=3079.8kbits/s speed=14.3x
frame=692207
fps=857.92
stream_0_0_q=-1.0
bitrate=3079.8kbits/s
total_size=4441292419
out_time_us=11536733398
out_time_ms=11536733398
out_time=03:12:16.733398
dup_frames=0
drop_frames=0
speed=14.3x
progress=end
video:4123902kB audio:183378kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.694622%
[Parsed_loudnorm_0 @ 0x558920476b40]
{
"input_i" : "-16.92",
"input_tp" : "2.72",
"input_lra" : "17.70",
"input_thresh" : "-28.76",
"output_i" : "-22.99",
"output_tp" : "-2.00",
"output_lra" : "11.10",
"output_thresh" : "-34.24",
"normalization_type" : "dynamic",
"target_offset" : "-0.01"
}
[aac @ 0x5589203227c0] Qavg: 771.614
DEBUG: Moving temporary file from /tmp/ble6nigo.mp4 to output.mp4
DEBUG: Normalization finished
INFO: Normalized file written to output.mp4

Environment (please complete the following information):

  • [x] Your operating system
  • [x] Your Python version / distribution (python --version)
  • [x] Your ffmpeg version (ffmpeg -version)
$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux bookworm/sid
Release:        testing
Codename:       bookworm
$ python --version
Python 3.10.2
$ ffmpeg -version
ffmpeg version 5.0 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 10.3.0 (GCC)
configuration: --prefix=/home/conda/feedstock_root/build_artifacts/ffmpeg_1646229198505/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac --cc=/home/conda/feedstock_root/build_artifacts/ffmpeg_1646229198505/_build_env/bin/x86_64-conda-linux-gnu-cc --disable-doc --disable-openssl --enable-demuxer=dash --enable-gnutls --enable-gpl --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-vaapi --enable-libx264 --enable-libx265 --enable-libaom --enable-libsvtav1 --enable-libxml2 --enable-libvpx --enable-pic --enable-pthreads --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libmp3lame --pkg-config=/home/conda/feedstock_root/build_artifacts/ffmpeg_1646229198505/_build_env/bin/pkg-config
libavutil      57. 17.100 / 57. 17.100
libavcodec     59. 18.100 / 59. 18.100
libavformat    59. 16.100 / 59. 16.100
libavdevice    59.  4.100 / 59.  4.100
libavfilter     8. 24.100 /  8. 24.100
libswscale      6.  4.100 /  6.  4.100
libswresample   4.  3.100 /  4.  3.100
libpostproc    56.  3.100 / 56.  3.100

ning-y avatar Mar 21 '22 18:03 ning-y

Thanks for providing a detailed report. With these kinds of errors, there probably isn't much ffmpeg-normalize can do about it, since it is just calling ffmpeg to do the work.

Just to make sure: the input itself has no drifting timestamps?

Does the following reproduce the error?

ffmpeg -i input.mp4 -c:a aac -c:v copy output.mp4

Does the following also exhibit the same drift?

ffmpeg -i input.mp4 -c copy output.mp4

slhck avatar Mar 21 '22 19:03 slhck

I'm not sure what a drifting timestamp is. The input.mp4 plays fine with audio/video in sync. Is there a way I can otherwise check for a drifting timestamp?

Both of the solutions you provided worked to produce an output without desync. I tried using the same arguments with ffmpeg-normalize but there were syntax error (you probably already saw this coming):

$ ffmpeg-normalize input.mp4 -c:a aac -c:v copy -o sol3.mp4
WARNING: The sample rate will automatically be set to 192 kHz by the loudnorm filter. Specify -ar/--sample-rate to override it.
usage: ffmpeg-normalize input [input ...]
    [-h]
    [-o OUTPUT [OUTPUT ...]] [-of OUTPUT_FOLDER]
    [-f] [-d] [-v] [-q] [-n] [-pr]
    [--version]
    [-nt {ebu,rms,peak}] [-t TARGET_LEVEL] [-p]
    [-lrt LOUDNESS_RANGE_TARGET] [-tp TRUE_PEAK] [--offset OFFSET] [--dual-mono]
    [-c:a AUDIO_CODEC] [-b:a AUDIO_BITRATE] [-ar SAMPLE_RATE] [-koa]
    [-prf PRE_FILTER] [-pof POST_FILTER]
    [-vn] [-c:v VIDEO_CODEC]
    [-sn] [-mn] [-cn]
    [-ei EXTRA_INPUT_OPTIONS] [-e EXTRA_OUTPUT_OPTIONS]
    [-ofmt OUTPUT_FORMAT]
    [-ext EXTENSION]
ffmpeg-normalize: error: ambiguous option: -c could match -c:a, -c:v, -cn
$ ffmpeg-normalize input.mp4 -c copy -o sol4.mp4
usage: ffmpeg-normalize input [input ...]
    [-h]
    [-o OUTPUT [OUTPUT ...]] [-of OUTPUT_FOLDER]
    [-f] [-d] [-v] [-q] [-n] [-pr]
    [--version]
    [-nt {ebu,rms,peak}] [-t TARGET_LEVEL] [-p]
    [-lrt LOUDNESS_RANGE_TARGET] [-tp TRUE_PEAK] [--offset OFFSET] [--dual-mono]
    [-c:a AUDIO_CODEC] [-b:a AUDIO_BITRATE] [-ar SAMPLE_RATE] [-koa]
    [-prf PRE_FILTER] [-pof POST_FILTER]
    [-vn] [-c:v VIDEO_CODEC]
    [-sn] [-mn] [-cn]
    [-ei EXTRA_INPUT_OPTIONS] [-e EXTRA_OUTPUT_OPTIONS]
    [-ofmt OUTPUT_FORMAT]
    [-ext EXTENSION]
ffmpeg-normalize: error: ambiguous option: -c could match -c:a, -c:v, -cn

ning-y avatar Mar 22 '22 06:03 ning-y

Drifting timestamps means that there will be an offset between audio and video that gets larger and larger.

What happens when you do:

ffmpeg -i input.mp4 -filter_complex "[0:a]loudnorm[a0]" -map "[a0]" -map 0:v -c:a aac -c:v copy output.mp4

slhck avatar Mar 22 '22 08:03 slhck

Drifting timestamps means that there will be an offset between audio and video that gets larger and larger.

What happens when you do:

ffmpeg -i input.mp4 -filter_complex "[0:a]loudnorm[a0]" -map "[a0]" -map 0:v -c:a aac -c:v copy output.mp4

This one gives drifting timestamps in output.mp4

ning-y avatar Mar 23 '22 12:03 ning-y

In that case this is an upstream bug in ffmpeg about which I cannot change much. It seems to be an issue with the loudnorm filter itself. Can you supply an input file with which the issue can be reproduced? In that case it would be good to post it on https://trac.ffmpeg.org/ as a bug report.

slhck avatar Mar 23 '22 13:03 slhck

Possibly related: https://github.com/slhck/ffmpeg-normalize/issues/146#issuecomment-1047640861

mgrachten avatar Oct 13 '22 11:10 mgrachten

Closing this for now, if it appears again let's reopen with a repro case.

slhck avatar Dec 13 '23 08:12 slhck