ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

Support FP8

Open sf467 opened this issue 2 years ago • 14 comments

Recently, someone has implemented fp8, which greatly reduces the memory usage,It is important for AIGC So I request to add support for fp8 in comfyui like https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/14031

Thank you very much

sf467 avatar Dec 03 '23 13:12 sf467

https://github.com/comfyanonymous/ComfyUI/commit/31b0f6f3d8034371e95024d6bba5c193db79bd9d

Pytorch has two different fp8 formats implemented so I implemented support for both.

You can launch ComfyUI with this, one is to set the CLIP/text encoder in fp8 and one is for the UNET. --fp8_e4m3fn-text-enc --fp8_e4m3fn-unet

Or you can use this for the other fp8 format but the previous one seems to give better results. --fp8_e5m2-text-enc --fp8_e5m2-unet

comfyanonymous avatar Dec 04 '23 16:12 comfyanonymous

Do we have to update torch manually or something ? Because I just updated ComfyUI and all my custom nodes and get this error at model loading : "module 'torch' has no attribute 'float8_e4m3fn'"

Mikerhinos avatar Dec 04 '23 18:12 Mikerhinos

Any workaround for 16xx? Got a black output.

Hansynily avatar Dec 04 '23 18:12 Hansynily

To update torch on the standalone use: update/update_comfyui_and_python_dependencies.bat

If that doesn't work you can grab the latest version of the standalone. It hasn't been updated for the fp8 stuff yet so you'll have to: update/update_comfyui.bat

If you are not using the standalone I recommend doing a: pip install --upgrade torch torchvision torchaudio -r requirements.txt

For the 16xx it's probably not worth it, last time I checked fp16 was 3x slower than fp32 so I assume it will be similar for fp8.

comfyanonymous avatar Dec 04 '23 18:12 comfyanonymous

Should we download all the fp8 checkpoints ? Or it just work ?

x4080 avatar Dec 05 '23 03:12 x4080

On the A1111 side, the fp8 implementation step seems to be that it is first converted to fp8 to be stored in the video memory, and then converted to fp16 for inference calculations. This works on all models, no additional download of fp8 checkpoints is required

sf467 avatar Dec 05 '23 03:12 sf467

You don't need fp8 checkpoints it will auto convert checkpoint weights to fp8 when it's loading them.

comfyanonymous avatar Dec 05 '23 04:12 comfyanonymous

Thanks all for the clarification

x4080 avatar Dec 05 '23 20:12 x4080

Hi, i'm on apple silicon and I tried with --fp8_e4m3fn-text-enc --fp8_e4m3fn-unet after pip install --upgrade torch torchvision torchaudio -r requirements.txt it gives this error :

Error occurred when executing CLIPTextEncode:

mixed dtype (CPU): expect parameter to have scalar type of Float
...

Is fp8 not work on mac yet ?

x4080 avatar Dec 05 '23 20:12 x4080

I have found that FP8 does not support LCM for some reason

LuLmaster69 avatar Dec 09 '23 10:12 LuLmaster69

I have found that FP8 does not support LCM for some reason

After turning on fp8, lora needs a higher weight.

sf467 avatar Dec 11 '23 04:12 sf467

Anybody getting this error: RuntimeError: "addmm_cuda" not implemented for 'Float8_e4m3fn'

I'm using ComfyUI with the Krita plug-in. I'm not sure if that has any relation to this problem I experience when applying 100% denoise on a certain part of the image when inpainting.

NoMansPC avatar Dec 13 '23 14:12 NoMansPC

same

Anybody getting this error: RuntimeError: "addmm_cuda" not implemented for 'Float8_e4m3fn'

za-wa-n-go avatar Dec 16 '23 14:12 za-wa-n-go

I was experiencing errors due to a specific custom node(kohya_hiresfix.py). The problem was resolved by not using the custom node.

za-wa-n-go avatar Dec 16 '23 16:12 za-wa-n-go