diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

[Community Pipeline] Modified Cross-Attention - Structured Diffusion Guidance for Compositional T2I synthesis

Open apolinario opened this issue 3 years ago • 7 comments

Intro

Community Pipelines are introduced in diffusers==0.4.0 with the idea of allowing the community to quickly add, integrate, and share their custom pipelines on top of diffusers.

You can find a guide about Community Pipelines here. You can also find all the community examples under examples/community/. If you have questions about the Community Pipelines feature, please head to the parent issue.

Idea: Modified cross-attention mechanism

This pipeline aims to implement this paper to Stable Diffusion, improving interpretability of the prompts. Some results of the paper Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis: image

apolinario avatar Oct 17 '22 14:10 apolinario

I want to work on this.

20RitikSingh avatar Oct 17 '22 16:10 20RitikSingh

Awesome @20RitikSingh! Feel free to share your progress and ask if you find any challenges

apolinario avatar Oct 17 '22 18:10 apolinario

This is a super interesting paper! Note that there is a reference implementation included in the .zip of "supplementary material" on the paper submission. Also note that implementation is not released under the Apache License, so I don't know if :hugs: can accept a PR that includes it. It might be safer to do a clean-room implementation from only the description in the text of the paper.

[I Am Not A Lawyer and I Am Not Your Lawyer, but the paper's authors are obligated to remain anonymous until the end of the ICLR 2023 review period, so they might have a hard time speaking up for themselves right now.]

If you do choose to use the code from the supplementary material, you may need to swap some things around to make it better fit diffusers instead of ldm.

keturn avatar Oct 18 '22 04:10 keturn

Also cc @patil-suraj here FYI

patrickvonplaten avatar Oct 20 '22 17:10 patrickvonplaten

Interesting!

@20RitikSingh , let us know you need any help or have any questions, happy to help :)

patil-suraj avatar Oct 26 '22 15:10 patil-suraj

Hi, I have built a pipeline with diffusers based on the reference implementation. Here is my implementation of the Structured Diffusion Guidance: https://github.com/shunk031/training-free-structured-diffusion-guidance.

However, I am not confident in my implementation because some parts of the published implementation are missing (e.g., sampler; a new argument, skip, has been added, but the details of this argument are not clear). Also, my implementation assumes batch size = 1, so it does not support multiple batch sizes. My implementation seems to have a bug with this limitation as well. Thank you.

shunk031 avatar Oct 29 '22 12:10 shunk031

The paper has been submitted as https://arxiv.org/abs/2212.05032 and is no longer anonymous! https://weixi-feng.github.io/structure-diffusion-guidance/

MIT licensed code by @weixi-feng at https://github.com/weixi-feng/Structured-Diffusion-Guidance

keturn avatar Dec 17 '22 10:12 keturn

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Apr 12 '23 15:04 github-actions[bot]