audio TorchAudio Dispatcher Migration

Overview

We propose the following end state for TorchAudio’s I/O functions info, load, save:

FFmpeg is the primary backend for TorchAudio’s I/O functions info, load, save.
FFmpeg-, SoX-, and soundfile-based backends are user-selectable from said I/O functions and are no longer determined by global state.
All of FFmpeg, SoX, and soundfile are optional dependencies.
I/O functions no longer support TorchScript.

Context

TorchAudio’s functions info, load, and save currently rely on two third-party libraries: SoX and soundfile. Whereas SoX is used in the Linux and Mac distributions, soundfile is used in the Windows distribution.

Through the years, we’ve encountered several issues with SoX:

Its handling of in-memory decoding is buggy and requires a local patch to fix.
It accesses the internal structure of FILE* object. As a result, 14.4.42 does not compile on Windows with MSVC newer than 2013. This precludes us from using SoX across platforms.
It attempts to rewind stdin.
It has not been actively developed/maintained since 2015, which doesn’t lend confidence that the aforementioned issues will be addressed.
It has caused other user-facing problems:
- https://github.com/pytorch/audio/issues/2870
- https://github.com/pytorch/audio/issues/2356

Separately, our work around streaming I/O introduced FFmpeg as a dependency. FFmpeg's advantages over SoX include the following:

It’s battle tested. It’s been developed for over 20 years now and is widely used in industry.
The library code is portable across Linux, Mac, and Windows.
It supports a wide variety of codecs, from basic to advanced, for both audio and video.
It supports GPU acceleration in decoding and encoding.
The C API offers a high degree of customizability.
- It abstracts away many things like codecs, file formats, and devices.
- It allows for implementing custom I/O such as in-memory decoding/encoding with file-like object protocol.
It’s being actively developed, with the latest version (5.1) having been released in July 2022.

End state

To address the issues above, we propose the following end state:

FFmpeg is the primary backend for TorchAudio’s I/O functions info, load, save.
FFmpeg-, SoX-, and soundfile-based backends are user-selectable from said I/O functions and are no longer determined by global state.
All of FFmpeg, SoX, and soundfile are optional dependencies.
I/O functions no longer support TorchScript.

We anticipate this end state bringing greater cross-platform consistency, simplifying our codebase, and delivering an improved user experience.

Plan

Release 2.0

[x] Introduce option to {info, load, save} that allows users to choose any of FFmpeg, SoX, and soundfile as the I/O backend for both file paths and file objects, while preserving the existing behavior, i.e. Linux and Mac distributions default to relying on SoX, Windows distributions default to relying on soundfile.
- Doing so naturally removes TorchScript support in {info, load, save}.
[x] Add deprecation warnings that convey that release 2.0 will make FFmpeg the default backend for files and file objects for {info, load, save} and encourage users to switch over to FFmpeg.

Release 2.1

[x] Make FFmpeg the default backend for files and file objects for {info, load, save} across all platforms. (#3241)
[x] Make SoX an optional dependency that is dynamically linked if available. Linking enables the SoX backend and torchaudio.sox_effects. #3497
[x] Remove file-like object handling for the SoX backend. #3035

Release 2.2

[x] Remove dependence of backend selection on global state. #3559

Jan 03 '23 17:01 hwangjeff

Related discussion in core: https://github.com/pytorch/pytorch/issues/81102. Ffmpeg integration is currently overlapped/duplicated between torchvision and torchaudio. It would be cool if it moved to a single implementation (in a new / separate package?)

Also supporting eliminating global backend state, and forcing user to maintain this selection themselves if they want to use a non-default backend.

Jan 23 '23 15:01 vadimkantorov

Hi @vadimkantorov — thanks for flagging. Somewhat independently of this particular issue, we are indeed considering consolidating media I/O in a separate package. We'll post updates on the outcomes of our discussions to https://github.com/pytorch/pytorch/issues/81102.

Jan 24 '23 18:01 hwangjeff