[Feature Request][Community] Ability to pass text_embeddings/uncond_embeddings as arguments in pipe call
Is your feature request related to a problem? Please describe. Im experimenting with aesthetic gradients and need to overwrite pip call to pass text_embeddings/uncond_embeddings. Also it might save a bit of time with making a lot of images with same promt.
Describe the solution you'd like Ability to pass text_embeddings/uncond_embeddings to pipe call.
Describe alternatives you've considered Idk. Maybe split everything to separate functions and make it less cluttered.
Hey @hadaev8,
Do you think we could use a community pipeline for this? If there is a big use case, it'd also be ok for me to add it to the native stable diffusion pipelines (wdyt @patil-suraj @anton-l @pcuenca ?)
Also see: https://github.com/huggingface/diffusers/pull/958 -> in general ok for me this request - curious to hear other opinions though!
Since aesthetic gradients modify text encoder output it works with every pipe. Didnt tested with new inpainting, but why not. So i think separate pipe is not a good way.
Also, if it is acceptable in this repo, i would like to contribute example notebook or something showing how it work with all default pipes.
@hadaev8 I'd be interested to see that, do you have a colab available?
Hey @hadaev8 ! Could you point us to an example of aesthetic gradients ? Hearing it for the first time :)
If it's a really big use case I would also be in favor of it. For now I see two things which could benifit from this
- imagic #958 -> I'm not sure if we can modify the pipeline for this, as the trained checkpoints are not really re-usable and are specific to the prompt and image being edited.
- and stable diffusion videos where we interpolate text embeddings -> but this requires lots of additional stuff and already has a repo and custom pipeline
so unless we have a really big use case, I would like to keep the pipelines simple :)
Hey @hadaev8 ! Could you point us to an example of aesthetic gradients ? Hearing it for the first time :)
If it's a really big use case I would also be in favor of it. For now I see two things which could benifit from this
- imagic Add imagic to community pipelines #958 -> I'm not sure if we can modify the pipeline for this, as the trained checkpoints are not really re-usable and are specific to the prompt and image being edited.
- and stable diffusion videos where we interpolate text embeddings -> but this requires lots of additional stuff and already has a repo and custom pipeline
so unless we have a really big use case, I would like to keep the pipelines simple :)
I've seen the results of it, definitely worth taking a look. The end results are amazing
@patil-suraj This repo https://github.com/vicgalle/stable-diffusion-aesthetic-gradients
Basically it change weights of text encoder to match clip image representations.
Almost all possible to do outside of pipe, but because of such tuning catastrophic forgetting kicks in, so i think (and author do it too) its better to pass unchanged uncond embeddings from original text model.
In my notebooks i just copypasted whole pipe function for very minor change. Ofc its my problem, but flexibility is always good.
@dblunk88 https://colab.research.google.com/drive/1RXolb8ozC4qSCZfnfO-PdVSC25Aj1dTZ?usp=sharing Have fun
Awesome, thanks @hadaev8 !
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
This still would be nice to have
Actually fine for to add this ! Would someone be interested in opening a PR for this? I won't find time anytime soon, but I'll keep this issue on my radar in case more people ask for it
@patrickvonplaten I want to do it. How do you think should I use prompt and negative prompt variables and put checks if it already tensors? Like image arg already does.
Hey @hadaev8,
I think we should just add new variables just like we've done for UnCLIP here: https://github.com/huggingface/diffusers/pull/1858
Happy to help with a PR :-)
Mostly solved by https://github.com/huggingface/diffusers/pull/2071