diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

[🌟 New Model] ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

Open Bai-YT opened this issue 1 year ago • 9 comments

Model/Pipeline/Scheduler description

ConsistencyTTA, introduced in the paper Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation, is an efficient text-to-audio generation model. Compared to a comparable diffusion-based TTA model, ConsistencyTTA achieves a 400x generation speed-up, while retaining the generation quality and diversity.

Due to its high generation quality and fast inference, we believe integrating this model into diffusers will make diffusers more appealing to text-to-audio generation researchers and users! Thank you very much.

Open source status

  • [X] The model implementation is available.
  • [X] The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

The open-source code implementation can be found at https://github.com/Bai-YT/ConsistencyTTA.

There is also a simplified implementation for inference only: https://github.com/Bai-YT/ConsistencyTTA/tree/main/easy_inference.

The model checkpoints can be found at https://huggingface.co/Bai-YT/ConsistencyTTA.

I am the main author of the code, and am more than happy to assist the integration.

Bai-YT avatar Jun 05 '24 21:06 Bai-YT

@sanchit-gandhi @Vaibhavs10 FYI.

sayakpaul avatar Jun 06 '24 06:06 sayakpaul

@Bai-YT Thank you for your awesome work! I just finished understanding the paper and think that I have a good grasp of the modeling and inference code to convert to diffusers.

@sayakpaul Could I pick this up if no one's working on it?

a-r-r-o-w avatar Jun 27 '24 04:06 a-r-r-o-w

Yeah for sure.

sayakpaul avatar Jun 27 '24 04:06 sayakpaul

@a-r-r-o-w cool! but let's put it in community folder to start with

yiyixuxu avatar Jun 27 '24 06:06 yiyixuxu

Sure, sounds good.

a-r-r-o-w avatar Jun 27 '24 06:06 a-r-r-o-w

@Bai-YT Thank you for your awesome work! I just finished understanding the paper and think that I have a good grasp of the modeling and inference code to convert to diffusers.

@sayakpaul Could I pick this up if no one's working on it?

Appreciate everyone's time for helping!!! Massive thanks.

Bai-YT avatar Jun 27 '24 07:06 Bai-YT

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Sep 14 '24 15:09 github-actions[bot]

Hi everyone,

Thank you for the effort in adding ConsistencyTTA into diffusers! I just hoped to kindly check in to see if there has been any update. If there's anything I can help, please feel free to let me know!

Sincerely, Yatong

Bai-YT avatar Oct 13 '24 08:10 Bai-YT

Hi @Bai-YT, thanks for your awesome work! We do have a PR open here, but we also had different plans on how to support it (relevant discussion in the PR). The pipeline works and one can run inference, but I haven't found the time to implement what was discussed in the PR yet. I will try giving it a shot in the near future.

a-r-r-o-w avatar Oct 15 '24 20:10 a-r-r-o-w

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Nov 09 '24 15:11 github-actions[bot]

@sayakpaul Not stale. At some point in Diffusers community scripts compatibility, it would be nice to make it so that modeling + pipeline code in a single file works as expected. This is currently not supported (my PR uses different file for modeling which is on the Hub, and different file for pipeline which is in Diffusers community folder but YiYi mentioned her concerns with this approach so we didn't proceed with it)

a-r-r-o-w avatar Nov 09 '24 16:11 a-r-r-o-w

Hi @Bai-YT, thanks for your awesome work! We do have a PR open here, but we also had different plans on how to support it (relevant discussion in the PR). The pipeline works and one can run inference, but I haven't found the time to implement what was discussed in the PR yet. I will try giving it a shot in the near future.

Hi Aryan, sorry I just saw the message. Thank you very much for handling this!

I took a look at the PR and it looks awesome! From an algorithmic perspective, I just wanted to mention two things:

  • When ConsistencyTTA is used as a one-step/few-step model (which is what it is designed to do), setting the conventional CFG guidance_scale to a number other then 1 will likely not perform very well, and CFG is instead handled by the model internally with guidance_scale_cond, which should be much more powerful and perform much better.
  • The model was not trained/distilled with negative prompts, so I'm not sure how it will perform/behave with them.

I absolutely understand that these options are for compatibility with other models in the API, and it's very nice to have them here. Not requesting for code changes at all, but perhaps it might worth mentioning these in the documentation so that the users can have an idea.

I wish you a nice rest of the day!

Bai-YT avatar Nov 10 '24 00:11 Bai-YT

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Dec 04 '24 15:12 github-actions[bot]