diffusers add AudioDiffusionPipeline and LatentAudioDiffusionPipeline #1334

I have added AudioDiffusionPipeline and LatentAudioDiffusionPipeline which I intend to migrate from https://github.com/teticio/audio-diffusion. I have added them to the main src as opposed to the community pipelines due to the inheritance of LatentAudioDiffusionPipeline from AudioDiffusionPipeline, which cannot be done in a single pipeline file, as well as the fact that the Mel class is needed to convert from audio to images and vice versa. It might make sense to move the Mel class somewhere more central, as it could be used by other pipelines.

Nov 25 '22 18:11 teticio

The documentation is not available anymore as the PR was closed or merged.

Nov 25 '22 18:11 HuggingFaceDocBuilderDev

@patrickvonplaten See previous PR for additional comments (https://github.com/huggingface/diffusers/pull/1334)

Nov 25 '22 18:11 teticio

@patrickvonplaten I guess you must be super busy but it would be great if you could just let me know if the basic approach of moving Mel into models so that it can be used as a compostable component in the pipeline (and therefore replaced by a neural alternative) works for you. Then I can migrate my saved models and existing repo to this format ahead of the release to diffusers. Bear in mind that I had to make Mel a LOADABLE_CLASS for this. Thanks and sorry for the bother.

Nov 30 '22 20:11 teticio

@patrickvonplaten . I had to add ConfixMixin to the LOADABLE_CLASSES so that Mel could be instantiated from_pretrained. (Previously I had added Mel here, but agree that is too specific.) Can you think of a better solution? Maybe there should be a LoadableClasssMixin instead?

I jumped the gun and assumed that we are close to being able to merge, so I have updated my existing repo and model artefacts to be compatible with this PR. In other words, the slow tests will now work also.

Dec 02 '22 08:12 teticio

Hey @teticio, I think you could still use the ModelMixin for the Mel class so that we don't need to update pipeline_utils.py :-)

Dec 02 '22 17:12 patrickvonplaten

This PR looks good for merge to me! Note that we should change the model_index.json as done in this PR because Mel should not be in the public API: https://huggingface.co/teticio/latent-audio-diffusion-ddim-256/commit/ac08e817d31ac82498abf4eee6fd3954db41fe27

We do the same for other models :-) See: https://huggingface.co/BAAI/AltDiffusion-m9/blob/main/model_index.json#L17

Dec 05 '22 17:12 patrickvonplaten

@patrickvonplaten Hey, thanks for all the great suggestions and support along the way. Just a couple of things before we put this one to bed:

Mel is still importable from diffusers and pipelines - I think you may want to remove it from there. I've updated all my model repos following your example.
I see that the slow tests are failing - do the Dockerfiles only get built nightly? If so, then the missing libsndfile should hopefully get installed with the changes I made.

Dec 05 '22 19:12 teticio