InvokeAI [enhancement]: Feature Suggestion: Implementing WaveSpeed for Accelerating Flux by 10x

Is there an existing issue for this?

[x] I have searched the existing issues

Contact Details

What should this feature add?

Dear Invoke AI Team,

I hope this message finds you well. I am writing to propose the integration of WaveSpeed into the Flux workflow within Invoke AI.

WaveSpeed is a cutting-edge optimization tool designed to significantly accelerate computational workflows, boasting a remarkable potential to achieve up to a 10x speed increase. Leveraging its capabilities within Flux could substantially enhance user experience by reducing generation times and increasing operational efficiency.

Key Benefits of Integration: 1. Performance Optimization: Accelerate Flux by up to 10x, enabling faster image generation and smoother workflows. 2. Improved User Experience: Shortened waiting times translate to enhanced satisfaction for users relying on real-time outputs. 3. Scalability: WaveSpeed’s optimization capabilities align with the needs of large-scale operations, particularly for demanding use cases in AI content generation.

WaveSpeed Highlights: • Open-source and actively maintained repository. • Compatible with existing AI frameworks. • Proven benchmarks showcasing significant acceleration across various tasks.

I believe this integration aligns perfectly with Invoke AI’s commitment to innovation and efficiency. I am happy to provide additional details, collaborate on testing, or assist with the initial setup to evaluate feasibility.

Thank you for considering this suggestion. I look forward to hearing your thoughts and exploring the potential of this enhancement together.

Best regards, William

Alternatives

No response

Additional Content

No response

Jan 25 '25 08:01 Aittor

The associated diffusers code is ParaAttention. Both versions provide two speed-enhancing mechanisms: FBCache and simplified torch.compile.

Some notes:

While this is very new code, the same author was responsible for developing stable-fast
The Comfy version linked has recently been updated for better compatibility with SDXL (and SD1.5, see Comfy-WaveSpeed#84) in addition to the original Flux and SD3.5 support
However, ParaAttention only supports Flux (ParaAttention#20)
torch.compile does not work on Windows due to a missing Triton dependency (see Comfy-WaveSpeed#38); however there is a somewhat credible fork that is already in use by the installable ComfyUI
There does not appear to be a Triton fork for Mac OS X
Unlike the MIT-licensed Comfy code, ParaAttention has a restrictive license barring use on hosted services (a.k.a. Invoke Professional)

This means that only FBCache for Flux could be supported by Invoke for now, and even then only if ParaAttention is re-licensed.

Jan 25 '25 18:01 iwr-redmond

Even without torch.compile, the speedup from WaveSpeed is significant.

Jan 31 '25 13:01 CommanderJ

First Block Cache is being considered for inclusion in diffusers proper, which means this FR may become viable after all!

Mar 31 '25 19:03 iwr-redmond