InvokeAI
InvokeAI copied to clipboard
[bug]: 5.0 release ignores quantization
Is there an existing issue for this problem?
- [X] I have searched the existing issues
Operating system
Windows
GPU vendor
Nvidia (CUDA)
GPU model
rtx 4090
GPU VRAM
24g
Version number
5
Browser
chrome
Python dependencies
No response
What happened
loading fp8 models uses the same amount of vram as loading the full unquantized versions of flux. capping my 24gigs
What you expected to happen
it should run at about 20 gigs or less depending on which of the Q models I choose.
How to reproduce the problem
No response
Additional context
No response
Discord username
No response
Yeah Because Invoke 5.0 cannot read the internal clip and t5 model inside the fp8 model..... The speed is painfuless slow now
https://github.com/invoke-ai/InvokeAI/issues/6940
meet the same situation, we need fp8
I noticed this as well.