add AudioDiffusionPipeline and LatentAudioDiffusionPipeline #1334
I have added AudioDiffusionPipeline and LatentAudioDiffusionPipeline which I intend to migrate from https://github.com/teticio/audio-diffusion. I have added them to the main src as opposed to the community pipelines due to the inheritance of LatentAudioDiffusionPipeline from AudioDiffusionPipeline, which cannot be done in a single pipeline file, as well as the fact that the Mel class is needed to convert from audio to images and vice versa. It might make sense to move the Mel class somewhere more central, as it could be used by other pipelines.
The documentation is not available anymore as the PR was closed or merged.
@patrickvonplaten See previous PR for additional comments (https://github.com/huggingface/diffusers/pull/1334)
@patrickvonplaten I guess you must be super busy but it would be great if you could just let me know if the basic approach of moving Mel into models so that it can be used as a compostable component in the pipeline (and therefore replaced by a neural alternative) works for you. Then I can migrate my saved models and existing repo to this format ahead of the release to diffusers. Bear in mind that I had to make Mel a LOADABLE_CLASS for this. Thanks and sorry for the bother.
@patrickvonplaten . I had to add ConfixMixin to the LOADABLE_CLASSES so that Mel could be instantiated from_pretrained. (Previously I had added Mel here, but agree that is too specific.) Can you think of a better solution? Maybe there should be a LoadableClasssMixin instead?
I jumped the gun and assumed that we are close to being able to merge, so I have updated my existing repo and model artefacts to be compatible with this PR. In other words, the slow tests will now work also.
Hey @teticio, I think you could still use the ModelMixin for the Mel class so that we don't need to update pipeline_utils.py :-)
This PR looks good for merge to me!
Note that we should change the model_index.json as done in this PR because Mel should not be in the public API: https://huggingface.co/teticio/latent-audio-diffusion-ddim-256/commit/ac08e817d31ac82498abf4eee6fd3954db41fe27
We do the same for other models :-) See: https://huggingface.co/BAAI/AltDiffusion-m9/blob/main/model_index.json#L17
@patrickvonplaten Hey, thanks for all the great suggestions and support along the way. Just a couple of things before we put this one to bed:
- Mel is still importable from diffusers and pipelines - I think you may want to remove it from there. I've updated all my model repos following your example.
- I see that the slow tests are failing - do the Dockerfiles only get built nightly? If so, then the missing
libsndfileshould hopefully get installed with the changes I made.