audio icon indicating copy to clipboard operation
audio copied to clipboard

TorchAudio Dispatcher Migration

Open hwangjeff opened this issue 3 years ago • 2 comments

Overview

We propose the following end state for TorchAudio’s I/O functions info, load, save:

  • FFmpeg is the primary backend for TorchAudio’s I/O functions info, load, save.
  • FFmpeg-, SoX-, and soundfile-based backends are user-selectable from said I/O functions and are no longer determined by global state.
  • All of FFmpeg, SoX, and soundfile are optional dependencies.
  • I/O functions no longer support TorchScript.

Context

TorchAudio’s functions info, load, and save currently rely on two third-party libraries: SoX and soundfile. Whereas SoX is used in the Linux and Mac distributions, soundfile is used in the Windows distribution.

Through the years, we’ve encountered several issues with SoX:

  • Its handling of in-memory decoding is buggy and requires a local patch to fix.
  • It accesses the internal structure of FILE* object. As a result, 14.4.42 does not compile on Windows with MSVC newer than 2013. This precludes us from using SoX across platforms.
  • It attempts to rewind stdin.
  • It has not been actively developed/maintained since 2015, which doesn’t lend confidence that the aforementioned issues will be addressed.
  • It has caused other user-facing problems:
    • https://github.com/pytorch/audio/issues/2870
    • https://github.com/pytorch/audio/issues/2356

Separately, our work around streaming I/O introduced FFmpeg as a dependency. FFmpeg's advantages over SoX include the following:

  • It’s battle tested. It’s been developed for over 20 years now and is widely used in industry.
  • The library code is portable across Linux, Mac, and Windows.
  • It supports a wide variety of codecs, from basic to advanced, for both audio and video.
  • It supports GPU acceleration in decoding and encoding.
  • The C API offers a high degree of customizability.
    • It abstracts away many things like codecs, file formats, and devices.
    • It allows for implementing custom I/O such as in-memory decoding/encoding with file-like object protocol.
  • It’s being actively developed, with the latest version (5.1) having been released in July 2022.

End state

To address the issues above, we propose the following end state:

  • FFmpeg is the primary backend for TorchAudio’s I/O functions info, load, save.
  • FFmpeg-, SoX-, and soundfile-based backends are user-selectable from said I/O functions and are no longer determined by global state.
  • All of FFmpeg, SoX, and soundfile are optional dependencies.
  • I/O functions no longer support TorchScript.

We anticipate this end state bringing greater cross-platform consistency, simplifying our codebase, and delivering an improved user experience.

Plan

Release 2.0

  • [x] Introduce option to {info, load, save} that allows users to choose any of FFmpeg, SoX, and soundfile as the I/O backend for both file paths and file objects, while preserving the existing behavior, i.e. Linux and Mac distributions default to relying on SoX, Windows distributions default to relying on soundfile.
    • Doing so naturally removes TorchScript support in {info, load, save}.
  • [x] Add deprecation warnings that convey that release 2.0 will make FFmpeg the default backend for files and file objects for {info, load, save} and encourage users to switch over to FFmpeg.

Release 2.1

  • [x] Make FFmpeg the default backend for files and file objects for {info, load, save} across all platforms. (#3241)
  • [x] Make SoX an optional dependency that is dynamically linked if available. Linking enables the SoX backend and torchaudio.sox_effects. #3497
  • [x] Remove file-like object handling for the SoX backend. #3035

Release 2.2

  • [x] Remove dependence of backend selection on global state. #3559

hwangjeff avatar Jan 03 '23 17:01 hwangjeff

Related discussion in core: https://github.com/pytorch/pytorch/issues/81102. Ffmpeg integration is currently overlapped/duplicated between torchvision and torchaudio. It would be cool if it moved to a single implementation (in a new / separate package?)

Also supporting eliminating global backend state, and forcing user to maintain this selection themselves if they want to use a non-default backend.

vadimkantorov avatar Jan 23 '23 15:01 vadimkantorov

Hi @vadimkantorov — thanks for flagging. Somewhat independently of this particular issue, we are indeed considering consolidating media I/O in a separate package. We'll post updates on the outcomes of our discussions to https://github.com/pytorch/pytorch/issues/81102.

hwangjeff avatar Jan 24 '23 18:01 hwangjeff