What does this PR do?

Adds support for ConsistencyTTA as a community pipeline.

Fixes #8414.

TODO:

~should we remove conversion script or place it in scripts (which would create import errors since it depends on custom unet in the pipeline file)? Old audioldm conversion scripts don't work either btw~
move converted checkpoints to authors account
~how to handle custom unet path present in pipeline file when using .from_pretrained?~

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

Jun 29 '24 22:06 a-r-r-o-w

Hey @sayakpaul @yiyixuxu @Bai-YT, I believe this is ready for an initial review. The pipeline is working with the local checkpoint. But, not sure how to make it work for those running with the hosted checkpoint because diffusers attempts to search for the unet file that is not existent since it is part of the community pipeline. Maybe we could upload custom modelling file for unet and have trust_remote_code=True?

error log

Traceback (most recent call last):
  File "/home/hywayadmin/datadisk/disk1/home/hamsadmin/aryanvs/github/personal/diffusers/examples/community/test.py", line 15, in <module>
    pipe = ConsistencyTTAPipeline.from_pretrained(
  File "/home/hywayadmin/datadisk/disk1/home/hamsadmin/aryanvs/github/venv/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/hywayadmin/datadisk/disk1/home/hamsadmin/aryanvs/github/personal/diffusers/src/diffusers/pipelines/pipeline_utils.py", line 702, in from_pretrained
    cached_folder = cls.download(
  File "/home/hywayadmin/datadisk/disk1/home/hamsadmin/aryanvs/github/venv/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/hywayadmin/datadisk/disk1/home/hamsadmin/aryanvs/github/personal/diffusers/src/diffusers/pipelines/pipeline_utils.py", line 1344, in download
    raise ValueError(
ValueError: unet/pipeline_consistency_txt2audio.py as defined in `model_index.json` does not exist in a-r-r-o-w/ConsistencyTTA and is not a module in 'diffusers/pipelines'.

code

import scipy
import torch
from pipeline_consistency_txt2audio import ConsistencyTTAPipeline
from diffusers import DDIMScheduler

model_id = "a-r-r-o-w/ConsistencyTTA"
# model_id = "../../ConsistencyTTA"

# scheduler = DDIMScheduler.from_pretrained(
#     model_id,
#     subfolder="scheduler",
# )
scheduler = None

pipe = ConsistencyTTAPipeline.from_pretrained(
    model_id,
    scheduler=scheduler,
    torch_dtype=torch.bfloat16,
).to("cuda")

prompt = "ducks quacking"

audio = pipe(
    prompt,
    num_inference_steps=1,
    audio_length_in_s=10,
    guidance_scale=1,
    guidance_scale_cond=4,
    generator=torch.Generator().manual_seed(42),
).audios[0]

scipy.io.wavfile.write("audio.wav", rate=16000, data=audio)

Converted model: https://huggingface.co/a-r-r-o-w/ConsistencyTTA

Jun 30 '24 21:06 a-r-r-o-w

https://huggingface.co/docs/diffusers/main/en/using-diffusers/custom_pipeline_overview is probably a better reference if you are using custom components.

Jul 01 '24 00:07 sayakpaul

Hey, addressed the above issues and ready for review. I did not realize how simple it was to use a custom unet until now!

minimal code for testing

import scipy
import torch
from diffusers import DiffusionPipeline


model_id = "a-r-r-o-w/ConsistencyTTA"

pipe = DiffusionPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    custom_pipeline="examples/community/pipeline_consistency_txt2audio.py",
    # or once merged: custom_pipeline="pipeline_consistency_txt2audio",
    trust_remote_code=True,
).to("cuda")

prompt = "Ducks quacking"

generator = torch.Generator().manual_seed(42)
audio = pipe(
    prompt,
    num_inference_steps=1,
    audio_length_in_s=10,
    guidance_scale=1,
    guidance_scale_cond=4,
    generator=generator,
).audios[0]

scipy.io.wavfile.write("audio.wav", rate=16000, data=audio)

Some results:

Prompt	Audio
Ducks quacking
An infant yelling as a young boy talks while a hard surface is slapped several times
Rolling thunder with lightning strikes
Multiple gun shots followed by a woman screaming

Jul 01 '24 21:07 a-r-r-o-w

I'm not sure why we would put the unet to the hub but pipeline in the community folder

cc @DN6 here since he's interested in improving the dev experience in adding community pipelines with custom components

Jul 01 '24 22:07 yiyixuxu

I planned on having the unet as part of the community pipeline file initially, but couldn't get it to work when using from_pretrained for the hub hosted model since current implementation looks for the unet implementation either in diffusers core or in hub unet/ directory as specified in model_index.json. It's also been a while since I've looked at the code base, so if there's a straightforward way to do it, I have either forgotten or missed it completely.

Jul 01 '24 22:07 a-r-r-o-w

you can put it in the same file like https://github.com/huggingface/diffusers/blob/main/examples/community/kohya_hires_fix.py

agree that it isn't very nice, if you have ideas on how to let users make community pipelines + components, feel free to share

Jul 02 '24 04:07 yiyixuxu

FWIW, we already allow sharing of custom components and custom pipelines: https://huggingface.co/docs/diffusers/main/en/using-diffusers/custom_pipeline_overview#community-components

Jul 02 '24 04:07 sayakpaul

you can put it in the same file like https://github.com/huggingface/diffusers/blob/main/examples/community/kohya_hires_fix.py

agree that it isn't very nice, if you have ideas on how to let users make community pipelines + components, feel free to share

I believe this only works when there aren't any new modeling components in the unet itself (in this case, we have additional guidance embeddings). For the linked kohya pipeline, only config changes are added which are okay. When we add modeling changes, we error out before even reaching the from_unet method call because diffusers will initially attempt to load the custom state dict (which has new modeling components) into the UNet2DConditionModel implementation (which does not have guidance embeddings, leading to unexpected keys found in state dict errors). If we specify in model_index.json to use a custom unet implementation, it only looks for it in diffusers core or unet/ directory of hub.

Jul 02 '24 04:07 a-r-r-o-w

I see. maybe we can do this:

for pipelines that do not come with new checkpoints, we add them to the community folder
for pipelines that come with new checkpoints, we add the code to the repo (both pipeline and models) since there is already a nice folder structure here, but we will add the link and info to the README page

cc @DN6 here because I think this is something in his to-do list - Let me know what you think!

Jul 02 '24 17:07 yiyixuxu

Sorry for my question but could you highlight the problems or what is missing from our custom pipeline AND custom component support? Custom components allow you to have both new checkpoints and new component level blocks.

SDXL Japanese is a good example of this.

Sharing the doc link once again: https://huggingface.co/docs/diffusers/main/en/using-diffusers/custom_pipeline_overview#community-components

Jul 02 '24 17:07 sayakpaul

@sayakpaul I think the issue is to do so with an "official community pipeline", i.e. to put store the code on GitHub - we currently do not allow that and IMO it is awkward to store the model code on the hub and pipeline code on GitHub, we can either:

figure it out a way to allow making custom components for "official community pipelines" (from what I understand @DN6 is interested in looking into that)
just encourage them to use hub methods for community pipelines with custom components but still help showcase these pipelines

Jul 02 '24 17:07 yiyixuxu

Sorry for my question but could you highlight the problems or what is missing from our custom pipeline AND custom component support? Custom components allow you to have both new checkpoints and new component level blocks.

SDXL Japanese is a good example of this.

Sharing the doc link once again: https://huggingface.co/docs/diffusers/main/en/using-diffusers/custom_pipeline_overview#community-components

Let me try and better explain what I've tried, what works and what doesn't.

I have followed the documentation for community components and implemented ConsistencyTTA in a similar fashion to SDXL Japanese. We make use of the trust_remote_code=True feature to use a custom unet implementation. You can verify this by checking the ConsistencyTTA/model_index.json (which uses the modeling_unet file), and ConsistencyTTA/unet/modeling_unet.py which contains the actual implementation.

This works well and, in that regard, this PR is ready.

YiYi mentions her concern that the pipeline code is present on GitHub community folder, but unet code is present on HF Hub. What she would like is for the unet code to be put in the GitHub community pipeline file too. Currently, this is not possible to do AFAIK. I'll explain why:

(1) model_index.json specifies where to look for each modeling class UNetDotDotDot. Here, if the file abc is specified, Diffusers will attempt to find it either in:

Diffusers core modeling components, OR
https://huggingface.co/<USER>/<REPO>/blob/<REVISION>/unet/abc.py

Now, this of course works when the modeling class UNetDotDotDot is present in either of those places. But if not, custom pipeline initialization will fail.

Note here that Diffusers does not look for the modeling code in community pipeline file, so if we implement unet in the pipeline file as YiYi asked, Diffusers will fail to initialize the pipeline with an error mentioning code for UNetDotDotDot could not be found.

(2) In order to initialize our custom unet, we can use the approach as done in Kohya script (linked by YiYi above) and delegate initialization to the from_unet method. This works well but only in cases where we make configuration changes to the underlying UNet model that is used in model_index.json. In most cases, this underlying model is UNet2DConditionModel.

Now, consider a case where you implement a new UNet that has some new modeling components never present before in UNet2DConditionModel, and push checkpoint to hub. If you now try to load it and use the delegate from_unet initialization, it will fail. Why? Because Diffusers will first try to load the checkpoint into UNet2DConditionModel instead of our custom implementation, at which point it would error out saying something like Unexpected keys found in state dict: <NEW_MODELING_COMPONENTS>, and therefore never even reach the from_unet delegated initialization call.

TLDR; there is no possible way of addressing YiYi's concern without it erroring out based on things I've tried and from my understanding of from_pretrained implementation for custom pipelines.

I hope this makes sense and better explains my older comment. A simple solution for this would be to extend functionality where for a model class mentioned in model_index.json, we search for its implementation in:

Diffusers core (already supported)
Model files on Hub (already supported)
Custom pipeline file used for running community pipelines (not supported)

Jul 02 '24 18:07 a-r-r-o-w

@yiyixuxu @a-r-r-o-w

So, the idea to put the component code on Hub originated from Transformers: https://huggingface.co/docs/transformers/en/custom_models.

YiYi mentions her concern that the pipeline code is present on GitHub community folder, but unet code is present on HF Hub. What she would like is for the unet code to be put in the GitHub community pipeline file too. Currently, this is not possible to do AFAIK.

Yes, that will fail. You are right. trust_remote_code always assumes that we're pulling the code from the repo_id we're providing to the pipeline from_pretrained().

We could introduce another argument here, custom_components, similar to how we have custom_pipeline, but I am slightly worried about the developer experience here. Could be nice to hear what @LysandreJik thinks about it.

Jul 03 '24 01:07 sayakpaul

Yes, that will fail. You are right. trust_remote_code always assumes that we're pulling the code from the repo_id we're providing to the pipeline from_pretrained().

We could introduce another argument here, custom_components, similar to how we have custom_pipeline, but I am slightly worried about the developer experience here. Could be nice to hear what @LysandreJik thinks about it.

@sayakpaul Ideally, once we move the unet code to the community pipeline file, we won't need to set trust_remote_code=True. Can we implement the custom component class loading such that Diffusers will look for them in the custom_pipeline file itself? If we do this, making custom pipelines with different implementations of UNets/Transformers/etc. would become simplified, and the modeling code could exist either in:

Hub subfolders (which would require trust_remote_code=True)
the pipeline file (does not require trust_remote_code=True if in diffusers community folder. Does require if the pipeline file is on the Hub)

Jul 17 '24 12:07 a-r-r-o-w

Can we implement the custom component class loading such that Diffusers will look for them in the custom_pipeline file itself? If we do this, making custom pipelines with different implementations of UNets/Transformers/etc. would become simplified, and the modeling code could exist either in

Okay I have a couple of follow-up questions here.

If you have custom components along with a custom pipeline and if those components are in the pipeline implementation, would those components have parameters too? Of course, schedulers don't need this, but others (like UNets, VAEs) do.

For just custom pipelines, contributors still open PRs, adding them to our GitHub folder, which we mirror to our Hub repository.

I would like to take the example of SDXL Japanese. It has a custom component which is used in the custom pipeline file. I think the reason these work is that the custom component doesn't have any custom params for itself that we need to serialize.

But the feature you have in mind can have custom components with custom params that might need to serialized, no?

In any case, feel free to start a PR if you want to but for me, I am okay with having things as is because with a little bit of more work, it works for a variety of combinations of custom components.

Jul 18 '24 04:07 sayakpaul

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sep 14 '24 15:09 github-actions[bot]

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Oct 15 '24 20:10 HuggingFaceDocBuilderDev

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Nov 09 '24 15:11 github-actions[bot]

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Dec 04 '24 15:12 github-actions[bot]

[community] ConsistencyTTA

What does this PR do?

Who can review?