diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

[`Research Project`] Add AnyText: Multilingual Visual Text Generation And Editing

Open tolgacangoz opened this issue 1 year ago • 3 comments

Thanks for the opportunity to fix #6407!

AnyText comprises a diffusion pipeline with two primary elements: an auxiliary latent module and a text embedding module. The former uses inputs like text glyph, position, and masked image to generate latent features for text generation or editing. The latter employs an OCR model for encoding stroke data as embeddings, which blend with image caption embeddings from the tokenizer to generate texts that seamlessly integrate with the background. We employed text-control diffusion loss and text perceptual loss for training to further enhance writing accuracy.

Paper: AnyText: Multilingual Visual Text Generation And Editing Repository: https://github.com/tyxsspa/AnyText Hugging Face Space: modelscope/AnyText

anytext anytext

TODOs: ⏳ AuxiliaryLatentModule :white_check_mark: AnyTextControlNetModel -> Inherited and adapted from ControlNetModel. The only difference is that using Gylph Block, Position Block, and Fuse Block instead of input_hint_block or controlnet_cond_embedding from an ordinary ControlNet -ControlNetConditioningEmbedding is different. I deactivated the ControlNetConditioningEmbedding part and moved the new blocks into AuxiliaryLatentModule just to comply with the Figure. ⏳ AnyTextPipeline -> Adapted from StableDiffusionControlNetPipeline. ⏳ TextEmbeddingModule -> Replaces the encode_prompt() function. I may transfer what TextEmbeddingModule does into encode_prompt(). :white_check_mark: convert_anytext_to_diffusers.py ⏳ Verify outputs with the original implementation ⏳ Finish HF integration & upload converted checkpoints to HF ⏳ README.md :white_large_square: Make it as simple as possible, but not simpler

Open In Colab

tolgacangoz avatar Jul 28 '24 18:07 tolgacangoz

The first results seem okayish...

prompt = 'photo of caramel macchiato coffee on the table, top-down perspective, with "Any" "Text" written on it using cream'

Original Implementation My Current Implementation
original anytext_mine

I am still checking if there is something wrong.

Edit: There was indeed a mistake I made: I forgot to load the parameters for the linear layer at the top of the OCR model.

tolgacangoz avatar Aug 07 '24 15:08 tolgacangoz

Absolutely amazing work here @tolgacangoz :heart: Thank you for picking this up after a stream of contributors (including me) mentioned that they'd take it up but weren't able to! The PR looks mostly good to me but I'll wait for it to be unmarked from draft.

Really really cool work here! cc @sayakpaul

a-r-r-o-w avatar Aug 24 '24 03:08 a-r-r-o-w

Thanks so much @a-r-r-o-w!

This PR is almost done. I am currently working on the Matryoshka model. I guess it is the priority. When I complete it I will immediately turn back to this PR.

tolgacangoz avatar Aug 24 '24 07:08 tolgacangoz

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Nov 30 '24 15:11 github-actions[bot]

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Feb 13 '25 15:02 github-actions[bot]

Hi @tolgacangoz, are you still working on this?

asomoza avatar Feb 13 '25 20:02 asomoza

Hi Álvaro, thanks for nudging me :) My priorities have had to change over the last 4-5 months. Starting tomorrow, I plan to complete this PR within 1 week.

tolgacangoz avatar Feb 14 '25 06:02 tolgacangoz

This will be my second pipeline contribution, yay :partying_face:

tolgacangoz avatar Mar 01 '25 14:03 tolgacangoz

thanks a lot, it looks good to me, really amazing project and port to diffusers with good results.

anytext

ccing @a-r-r-o-w because of https://github.com/huggingface/diffusers/pull/8998#issuecomment-2308036940

asomoza avatar Mar 01 '25 17:03 asomoza

Thanks for this opportunity to contribute!

tolgacangoz avatar Mar 11 '25 06:03 tolgacangoz