Liger-Kernel
Liger-Kernel copied to clipboard
[feat] FP8 Matmul Training Kernel
🚀 The feature, motivation and pitch
FP8 Training has been a great weapon on H100 and provides huge memory and speed benefits, and has shown to be effective (with limited or no performance drop) in many cases and reports. The kernel will be a fused fp8 matmul kernel with fusion of three parts:
input quantization (dynamic, static) + fp8 matmul (weight dynamic, static quantization inside) + output dequantization
Alternatives
No response
Additional context
No response