[bug] SageMaker Pytorch image has compatibility issues between ffmpeg version and torchaudio.io.StreamReader
Checklist
- [x] I've prepended issue tag with type of change: [bug]
- [x] (If applicable) I've attached the script to reproduce the bug
- [x] (If applicable) I've documented below the DLC image/dockerfile this relates to
- [x] (If applicable) I've documented below the tests I've run on the DLC image
- [x] I'm using an existing DLC image listed here: https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html
- [ ] I've built my own container based off DLC (and I've attached the code used to build my own image)
Concise Description: torchaudio.io.StreamReader requires ffmpeg version from 4.1 to 4.4, but the current SageMaker Pytorch training image has ffmpeg 5.1.2, which makes StreamReader fail to read video files.
DLC image/dockerfile:
763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.0.0-cpu-py310-ubuntu20.04-sagemaker
Current behavior:
I first ran docker run -it --gpus all 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.0.0-gpu-py310-cu118-ubuntu20.04-sagemaker /bin/bash to create a container from the docker image.
In the first test, I ran the following script to check the availability of ffmpeg from torchaudio
import torch
import torchaudio
from torchaudio.utils import ffmpeg_utils
print(torch.__version__)
print(torchaudio.__version__)
print(ffmpeg_utils.get_versions())
print(ffmpeg_utils.get_build_config())
print([k for k in ffmpeg_utils.get_video_decoders().keys() if 'cuvid' in k])
it errored out with the message
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/torchaudio/_extension/utils.py", line 85, in _init_ffmpeg
_load_lib("libtorchaudio_ffmpeg")
File "/opt/conda/lib/python3.10/site-packages/torchaudio/_extension/utils.py", line 61, in _load_lib
torch.ops.load_library(path)
File "/opt/conda/lib/python3.10/site-packages/torch/_ops.py", line 643, in load_library
ctypes.CDLL(path)
File "/opt/conda/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libavdevice.so.58: cannot open shared object file: No such file or directory
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/torchaudio/_extension/utils.py", line 134, in wrapped
_init_ffmpeg()
File "/opt/conda/lib/python3.10/site-packages/torchaudio/_extension/utils.py", line 87, in _init_ffmpeg
raise ImportError("FFmpeg libraries are not found. Please install FFmpeg.") from err
ImportError: FFmpeg libraries are not found. Please install FFmpeg.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/test/test_torchaudio.py", line 8, in <module>
print(ffmpeg_utils.get_versions())
File "/opt/conda/lib/python3.10/site-packages/torchaudio/_extension/utils.py", line 136, in wrapped
raise RuntimeError(
RuntimeError: get_versions requires FFmpeg extension which is not available. Please refer to the stacktrace above for how to resolve this.
In the second test, I created some test video files
mkdir /opt/test
cd /opt/test
ffmpeg -f lavfi -i mandelbrot -t 3 -c:v libx265 -pix_fmt yuv420p10le -vtag hvc1 -y test_hevc_hdr.mp4
ffmpeg -f lavfi -i mandelbrot -t 3 -c:v libx265 -pix_fmt yuv420p -vtag hvc1 -y test_hevc_sdr.mp4
ffmpeg -f lavfi -i mandelbrot -t 3 -c:v libx264 -pix_fmt yuv420p -vtag avc1 -y test_h264_sdr.mp4
and ran the following script in the same folder
from torchaudio.io import StreamReader
def test_func(src: str, decoder: str, device: str = 'cpu'):
if device == 'cuda':
decode_config = {
'buffer_chunk_size': 50,
'decoder': f'{decoder}_cuvid',
'hw_accel': 'cuda',
"format": None,
}
else:
decode_config = {
'buffer_chunk_size': 50,
'decoder': decoder,
"decoder_option": {"threads": "0"},
"format": "yuv420p",
}
video = StreamReader(src=src)
video.add_basic_video_stream(1, **decode_config)
stream = video.stream()
frame, = next(stream)
print(frame.device, frame.shape, frame.dtype)
return frame
if __name__ == "__main__":
test_videos = ['test_hevc_hdr.mp4', 'test_hevc_sdr.mp4', 'test_h264_sdr.mp4']
decoders = ['hevc', 'hevc', 'h264']
devices = ['cpu', 'cuda']
for src_path, decoder in zip(test_videos, decoders):
for device in devices:
test_func(src_path, decoder, device)
and it errored out with the same message as in the first test
Expected behavior: The expected output of the first test is something like
2.0.0
2.0.0
{'libavutil': (56, 70, 100), 'libavcodec': (58, 134, 100), 'libavformat': (58, 76, 100), 'libavfilter': (7, 110, 100), 'libavdevice': (58, 13, 100)}
--prefix=/home/ubuntu/.conda/envs/torchqa --cc=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/x86_64-conda-linux-gnu-cc --cxx=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/x86_64-conda-linux-gnu-c++ --nm=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/x86_64-conda-linux-gnu-nm --ar=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/x86_64-conda-linux-gnu-ar --disable-doc --disable-openssl --enable-avresample --enable-demuxer=dash --enable-hardcoded-tables --enable-libfreetype --enable-libfontconfig --enable-libopenh264 --enable-gnutls --enable-libmp3lame --enable-libvpx --enable-pthreads --enable-vaapi --enable-gpl --enable-libx264 --enable-libx265 --enable-libaom --enable-libsvtav1 --enable-libxml2 --enable-pic --enable-shared --disable-static --enable-version3 --enable-zlib --pkg-config=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/pkg-config
['av1_cuvid', 'h264_cuvid', 'hevc_cuvid', 'mjpeg_cuvid', 'mpeg1_cuvid', 'mpeg2_cuvid', 'mpeg4_cuvid', 'vc1_cuvid', 'vp8_cuvid', 'vp9_cuvid']
The expected output of the second test should be
cpu torch.Size([1, 3, 480, 640]) torch.uint8
cuda:0 torch.Size([1, 3, 480, 640]) torch.int16
cpu torch.Size([1, 3, 480, 640]) torch.uint8
cuda:0 torch.Size([1, 3, 480, 640]) torch.uint8
cpu torch.Size([1, 3, 480, 640]) torch.uint8
cuda:0 torch.Size([1, 3, 480, 640]) torch.uint8
Additional context:
According to my knowledge, ffmpeg installed by conda install ffmpeg=4.4.2 -c conda-forge works well with StreamReader in torchaudio 2.0.1. However, I am not able to uninstall ffmpeg 5.1.2 and re-install ffmpeg 4.4.2 because conda could not resolve the environment due to its inconsistency.
Hi @w238liu,
Have you tried installing the latest PyTorch 2.0.1 image ( 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.0.1-gpu-py310-cu118-ubuntu20.04-ec2 )? That might solve the issue you are seeing.
I pulled down the latest image and ran your first test and was able to receive the proper output:
> python3 test.py
2.0.1
2.0.2
{'libavutil': (56, 70, 100), 'libavcodec': (58, 134, 100), 'libavformat': (58, 76, 100), 'libavfilter': (7, 110, 100), 'libavdevice': (58, 13, 100)}
--prefix=/opt/conda --cc=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/x86_64-conda-linux-gnu-cc --cxx=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/x86_64-conda-linux-gnu-c++ --nm=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/x86_64-conda-linux-gnu-nm --ar=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/x86_64-conda-linux-gnu-ar --disable-doc --disable-openssl --enable-avresample --enable-demuxer=dash --enable-hardcoded-tables --enable-libfreetype --enable-libfontconfig --enable-libopenh264 --enable-gnutls --enable-libmp3lame --enable-libvpx --enable-pthreads --enable-vaapi --enable-gpl --enable-libx264 --enable-libx265 --enable-libaom --enable-libsvtav1 --enable-libxml2 --enable-pic --enable-shared --disable-static --enable-version3 --enable-zlib --pkg-config=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/pkg-config
['av1_cuvid', 'h264_cuvid', 'hevc_cuvid', 'mjpeg_cuvid', 'mpeg1_cuvid', 'mpeg2_cuvid', 'mpeg4_cuvid', 'vc1_cuvid', 'vp8_cuvid', 'vp9_cuvid']
This is still using ffmpeg=4.4.2 as seen in this call:
> conda list | grep ffmpeg
ffmpeg 4.4.2 gpl_h8dda1f0_112 conda-forge
@ohadkatz Hello, thanks for the suggestion. I just tried this image. It seems to work on an EC2 machine, but not work on SageMaker. Is there any plan to release a similar image dedicated for SageMaker?
Moreover, even on an EC2 machine with the EC2 image, the second test script errors out with a Segmentation fault. Any thoughts?
Hi @w238liu, we have released the SageMaker containers with ffmpeg 4.4.2 installed.
That fixed the first issue you've mentioned, here is the release tag:https://github.com/aws/deep-learning-containers/releases/tag/v1.4-pt-sagemaker-2.0.1-tr-gpu-py310.
On the second issue, i was able to reproduce the Segmentation fault on this container and using the upstream torch, and torchaudio installed via conda install pytorch=2.0.1 pytorch-cuda=11.8 torchaudio -c pytorch -c nvidia -c defaults. To debug this, i enabled python faulthandler via export PYTHONFAULTHANDLER=1 and get below result:
cpu torch.Size([1, 3, 480, 640]) torch.uint8
Fatal Python error: Segmentation fault
Current thread 0x00007f1aaf63a740 (most recent call first):
File "/opt/conda/lib/python3.10/site-packages/torchaudio/io/_stream_reader.py", line 753 in add_video_stream
File "/opt/conda/lib/python3.10/site-packages/torchaudio/io/_stream_reader.py", line 668 in add_basic_video_stream
File "//test.py", line 22 in test_func
File "//test.py", line 38 in <module>
Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, gmpy2.gmpy2 (total: 21)
Segmentation fault (core dumped)
~~Based on the above, the issue occurs at https://github.com/pytorch/audio/blob/v2.0.2/torchaudio/io/_stream_reader.py#L753~~ ~~For the next step, i will create an issue to pytorch/audio and get help.~~
Diving deeper today, i realized that the ffmpeg from pytorch doesn't support any of the *_cuvid decoder (see below), and that the ffmpeg used in this image (4.4.2 from conda-forge) shouldn't be the one that gets installed as we want to stick to the pytorch distribution of ffmpeg.
output of the first reproduce script, see the empty list [] at the end resulted from print([k for k in ffmpeg_utils.get_video_decoders().keys() if 'cuvid' in k])
2.0.1
2.0.2
{'libavutil': (56, 51, 100), 'libavcodec': (58, 91, 100), 'libavformat': (58, 45, 100), 'libavfilter': (7, 85, 100), 'libavdevice': (58, 10, 100)}
--prefix=/fsx/conda/envs/test_oss_audio --cc=/opt/conda/conda-bld/ffmpeg_1597178665428/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-pic --enable-pthreads --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libmp3lame
[]
So for the second issue you mentioned, our plan is to change ffmpeg back to the pytorch distribution here which means that the *_cuvid decoders won't be supported. Please let us know if you have concern over this change.
Hi @junpuf , thanks for your work and analysis into this issue.
For the second issue, I believe it is now fixed by the release of torchaudio 2.1. See this closed issue and the release note for details.
For the ffmpeg, I do have concerns over the change from the conda-forge distribution to the pytorch distribution. Without cuda decoders, the I/O could be very slow for UHD HDR videos.
Hi @w238liu, thanks for the above update, i read the issue that you opened and confirmed that the official documentation's recommendation of using 'ffmpeg<7' from conda-forge (link).
Since we released pytorch 2.1 container recently, i will take that and override ffmpeg and re-run the 2 test cases and get back to you.
Hi @junpuf , is there update regarding the 2 test cases? I am recently working with some 4K HDR videos, the decoding speed of which on CPUs is very slow. May I know if there is any plan to release a container that could support the torchaudio GPU decoder in the near future?
Hi @w238liu, I'm trying out today to see if GPU decoding can be enabled on the PyTorch Training Container with ffmpeg 6 from conda-forge.
First, i tried the test cases you provided on a g5 EC2 instance that has 2 GPUs, i installed the pytorch and torchaudio etc into a conda environment and then installed ffmpeg=6.1 in the same environment from conda-forge and was able to get the expected results. Below are the commands i used to create the conda environment
mamba create -n myenv python=3.10 pytorch=2.2.0 pytorch-cuda=12.1 torchaudio --strict-channel-priority --override-channels -c https://aws-ml-conda.s3.us-west-2.amazonaws.com -c nvidia -c conda-forge
source activate myenv
mamba install ffmpeg=6 -c conda-forge
However when i replicate the same setup in the container environment, i am consistently getting error below, and didn't have a solution at the moment.
RuntimeError: Failed to initialize CodecContext: Operation not permitted
Exception raised from open_codec at /opt/conda/conda-bld/torchaudio_1706759466457/work/src/libtorio/ffmpeg/stream_reader/stream_processor.cpp:150