TorchAudio Dispatcher Migration
Overview
We propose the following end state for TorchAudio’s I/O functions info, load, save:
- FFmpeg is the primary backend for TorchAudio’s I/O functions info, load, save.
- FFmpeg-, SoX-, and soundfile-based backends are user-selectable from said I/O functions and are no longer determined by global state.
- All of FFmpeg, SoX, and soundfile are optional dependencies.
- I/O functions no longer support TorchScript.
Context
TorchAudio’s functions info, load, and save currently rely on two third-party libraries: SoX and soundfile. Whereas SoX is used in the Linux and Mac distributions, soundfile is used in the Windows distribution.
Through the years, we’ve encountered several issues with SoX:
- Its handling of in-memory decoding is buggy and requires a local patch to fix.
- It accesses the internal structure of FILE* object. As a result, 14.4.42 does not compile on Windows with MSVC newer than 2013. This precludes us from using SoX across platforms.
- It attempts to rewind stdin.
- It has not been actively developed/maintained since 2015, which doesn’t lend confidence that the aforementioned issues will be addressed.
- It has caused other user-facing problems:
- https://github.com/pytorch/audio/issues/2870
- https://github.com/pytorch/audio/issues/2356
Separately, our work around streaming I/O introduced FFmpeg as a dependency. FFmpeg's advantages over SoX include the following:
- It’s battle tested. It’s been developed for over 20 years now and is widely used in industry.
- The library code is portable across Linux, Mac, and Windows.
- It supports a wide variety of codecs, from basic to advanced, for both audio and video.
- It supports GPU acceleration in decoding and encoding.
- The C API offers a high degree of customizability.
- It abstracts away many things like codecs, file formats, and devices.
- It allows for implementing custom I/O such as in-memory decoding/encoding with file-like object protocol.
- It’s being actively developed, with the latest version (5.1) having been released in July 2022.
End state
To address the issues above, we propose the following end state:
- FFmpeg is the primary backend for TorchAudio’s I/O functions info, load, save.
- FFmpeg-, SoX-, and soundfile-based backends are user-selectable from said I/O functions and are no longer determined by global state.
- All of FFmpeg, SoX, and soundfile are optional dependencies.
- I/O functions no longer support TorchScript.
We anticipate this end state bringing greater cross-platform consistency, simplifying our codebase, and delivering an improved user experience.
Plan
Release 2.0
- [x] Introduce option to {info, load, save} that allows users to choose any of FFmpeg, SoX, and soundfile as the I/O backend for both file paths and file objects, while preserving the existing behavior, i.e. Linux and Mac distributions default to relying on SoX, Windows distributions default to relying on soundfile.
- Doing so naturally removes TorchScript support in {info, load, save}.
- [x] Add deprecation warnings that convey that release 2.0 will make FFmpeg the default backend for files and file objects for {info, load, save} and encourage users to switch over to FFmpeg.
Release 2.1
- [x] Make FFmpeg the default backend for files and file objects for {info, load, save} across all platforms. (#3241)
- [x] Make SoX an optional dependency that is dynamically linked if available. Linking enables the SoX backend and
torchaudio.sox_effects. #3497 - [x] Remove file-like object handling for the SoX backend. #3035
Release 2.2
- [x] Remove dependence of backend selection on global state. #3559
Related discussion in core: https://github.com/pytorch/pytorch/issues/81102. Ffmpeg integration is currently overlapped/duplicated between torchvision and torchaudio. It would be cool if it moved to a single implementation (in a new / separate package?)
Also supporting eliminating global backend state, and forcing user to maintain this selection themselves if they want to use a non-default backend.
Hi @vadimkantorov — thanks for flagging. Somewhat independently of this particular issue, we are indeed considering consolidating media I/O in a separate package. We'll post updates on the outcomes of our discussions to https://github.com/pytorch/pytorch/issues/81102.