What does this PR do?

Support LinFusion. It accelerates diffusion models by replacing all the self-attention layers in a diffusion UNet with distilled Generalized Linear Attention layers. The distilled model is linear-complexity and highly compatible with existing diffusion plugins like ControlNet, IP-Adapter, LoRA, etc. The acceleration can be dramatic at high resolution. Strategical pipelines for high-resolution generation can be found in the original codebase.

You can use it with only 1 additional line:

import torch
from diffusers import StableDiffusionPipeline

repo_id = "stabilityai/stable-diffusion-2-1"
pipe = StableDiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.bfloat16, variant="fp16").to("cuda")

+ pipe.load_linfusion(pipeline_name_or_path=repo_id)

image = pipe("a photo of an astronaut on a moon").images[0]

Currently, stable-diffusion-v1-5/stable-diffusion-v1-5, stabilityai/stable-diffusion-2-1, stabilityai/stable-diffusion-xl-base-1.0, models finetuned from them, and pipelines based on them are supported. If the repo_id is different from them, e.g., when using a fine-tuned model from the community, you need to specify pipeline_name_or_path explicitly to the model it is based on. Otherwise, this argument is optional and LinFusion will read it from the current pipeline. Alternatively, you can also specify the argument pretrained_model_name_or_path_or_dict to load LinFusion from other sources. You can also unload it with pipe.unload_linfusion() when unnecessary.

Accordingly, we also update the doc under docs/source/en/optimization/linfusion.md for a specific example.

Thanks for your efforts in reviewing this pull request in advance! We are open to any changes to make sure LinFusion can best fit the current diffusers library!

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x] Did you read the contributor guideline?
[x] Did you read our philosophy doc (important for complex PRs)?
[ ] Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
[x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
[ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

Oct 06 '24 06:10 Huage001

Thanks a lot for this PR!

Could you maybe provide some numbers and resultant images for us to see how much gains do we see from GLA?

Oct 07 '24 05:10 sayakpaul

i cannot reproduce any speedups at 1k or 2k resolutions, is it only noticeable at higher resolutions? analysis is done using sdxl/euler-a/steps=30/cfg=6/dtype=bf16 (using standard linfusion apply to existing pipeline, not using optimized pipelines that ship in original repo)

also, i see pretty significant quality impact and need to significantly reduce cfg scale not to get burn consistent with distilled models.

attaching samples without and with linfusion: sdxl-default sdxl-linfusion-pretrained

Oct 09 '24 14:10 vladmandic

Thanks a lot for this PR!

Could you maybe provide some numbers and resultant images for us to see how much gains do we see from GLA?

i cannot reproduce any speedups at 1k or 2k resolutions, is it only noticeable at higher resolutions? analysis is done using sdxl/euler-a/steps=30/cfg=6/dtype=bf16 (using standard linfusion apply to existing pipeline, not using optimized pipelines that ship in original repo)

also, i see pretty significant quality impact and need to significantly reduce cfg scale not to get burn consistent with distilled models.

attaching samples without and with linfusion:

Dear sayakpaul and vladmandic,

Thanks for your reply and sorry for the late response because we are continuously optimizing the model.

Indeed, the acceleration by LinFusion is significant at higher resolution. Please refer to the table below for running time on SD-2.1 and SD-XL respectively:

WidthxHeight	w. LinFusion (sec. / 50 steps)	w/o. LinFusion (sec. / 50 steps)
768x768	2.43	2.29
1024x768	3.02	3.12
2048x768	5.29	6.74
3072x768	7.44	11.41
4096x768	9.81	17.61

WidthxHeight	w. LinFusion (sec. / 50 steps)	w/o. LinFusion (sec. / 50 steps)
1024x1024	6.18	5.50
2048x1024	10.87	10.86
3072x1024	16.27	17.52
4096x1024	21.88	25.08

The environment is 1 A100 with PyTorch 2.4.

For comparisons in terms of quality, we attach the following evaluation results of SD-v1.5, SD-v2.1, and SD-XL on the COCO 30000 benchmark:

Model	Metrics	w. LinFusion	w/o. LinFusion
SD-v1.5	FID	12.57	12.86
SD-v1.5	CLIP-T	0.323	0.321
SD-v2.1	FID	13.84	12.84
SD-v2.1	CLIP-T	0.329	0.333
SD-XL	FID	15.72	14.74
SD-XL	CLIP-T	0.338	0.340

Here is a qualitative example of SD-v2.1:

We refer readers to our paper for more results.

Since the acceleration is more significant on larger resolution, it would be useful to generate panorama images with StableDiffusionPanoramaPipeline or generate high-resolution images with DemoFusionSDXLPipeline, which are currently implemented by patch-wise strategy, affecting the quality a lot.

This pull request makes modifications to the parent class DiffusionPipeline. If you believe the modification is too aggressive, we could also consider making a package that can be installed with pip and making it a community pipeline without touching the native codes of diffusers. Feel free to let us know your opinions. :)

Oct 19 '24 02:10 Huage001

Thanks so much! Seems like FID takes a whole 1 point hit which feels like a lot. But visually, we don't see much difference. Maybe putting these models on https://imgsys.org/ to get a better preference ranking would be better. Cc: @isidentical.

If you believe the modification is too aggressive, we could also consider making a package that can be installed with pip and making it a community pipeline without touching the native codes of diffusers. Feel free to let us know your opinions. :)

Sure, I think for now, we could support this awesome work with a separate library. Based on the community interest, we can make it more integral within diffusers. Would that work for you? In any case, we're happy to include a note/guide in our documentation.

Do you also plan to support models like Flux?

Oct 19 '24 08:10 sayakpaul

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Nov 12 '24 15:11 github-actions[bot]