fp8 vs bf16

Open xizi opened this issue 9 months ago • 2 comments

Why doesn't the use of float8 data precision result in a noticeable speed increase when compared to using bfloat16 data precision?

Apr 14 '25 05:04 xizi

float8 sadly not even properly working on image to video models yet https://github.com/modelscope/DiffSynth-Studio/issues/466

Apr 14 '25 07:04 FurkanGozukara

@xizi Native FP8 computation requires support from Hopper architecture GPUs. To ensure compatibility, we have to temporarily convert to bfloat16 precision during computation. Therefore, FP8 quantization does not provide a speed improvement.

Apr 16 '25 02:04 Artiprocher