Support FP8
Recently, someone has implemented fp8, which greatly reduces the memory usage,It is important for AIGC So I request to add support for fp8 in comfyui like https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/14031
Thank you very much
https://github.com/comfyanonymous/ComfyUI/commit/31b0f6f3d8034371e95024d6bba5c193db79bd9d
Pytorch has two different fp8 formats implemented so I implemented support for both.
You can launch ComfyUI with this, one is to set the CLIP/text encoder in fp8 and one is for the UNET.
--fp8_e4m3fn-text-enc --fp8_e4m3fn-unet
Or you can use this for the other fp8 format but the previous one seems to give better results.
--fp8_e5m2-text-enc --fp8_e5m2-unet
Do we have to update torch manually or something ? Because I just updated ComfyUI and all my custom nodes and get this error at model loading : "module 'torch' has no attribute 'float8_e4m3fn'"
Any workaround for 16xx? Got a black output.
To update torch on the standalone use: update/update_comfyui_and_python_dependencies.bat
If that doesn't work you can grab the latest version of the standalone. It hasn't been updated for the fp8 stuff yet so you'll have to: update/update_comfyui.bat
If you are not using the standalone I recommend doing a:
pip install --upgrade torch torchvision torchaudio -r requirements.txt
For the 16xx it's probably not worth it, last time I checked fp16 was 3x slower than fp32 so I assume it will be similar for fp8.
Should we download all the fp8 checkpoints ? Or it just work ?
On the A1111 side, the fp8 implementation step seems to be that it is first converted to fp8 to be stored in the video memory, and then converted to fp16 for inference calculations. This works on all models, no additional download of fp8 checkpoints is required
You don't need fp8 checkpoints it will auto convert checkpoint weights to fp8 when it's loading them.
Thanks all for the clarification
Hi, i'm on apple silicon and I tried with --fp8_e4m3fn-text-enc --fp8_e4m3fn-unet after
pip install --upgrade torch torchvision torchaudio -r requirements.txt
it gives this error :
Error occurred when executing CLIPTextEncode:
mixed dtype (CPU): expect parameter to have scalar type of Float
...
Is fp8 not work on mac yet ?
I have found that FP8 does not support LCM for some reason
I have found that FP8 does not support LCM for some reason
After turning on fp8, lora needs a higher weight.
Anybody getting this error: RuntimeError: "addmm_cuda" not implemented for 'Float8_e4m3fn'
I'm using ComfyUI with the Krita plug-in. I'm not sure if that has any relation to this problem I experience when applying 100% denoise on a certain part of the image when inpainting.
same
Anybody getting this error: RuntimeError: "addmm_cuda" not implemented for 'Float8_e4m3fn'
I was experiencing errors due to a specific custom node(kohya_hiresfix.py). The problem was resolved by not using the custom node.