Liger-Kernel icon indicating copy to clipboard operation
Liger-Kernel copied to clipboard

[feat] FP8 Matmul Training Kernel

Open qingquansong opened this issue 1 year ago • 0 comments

🚀 The feature, motivation and pitch

FP8 Training has been a great weapon on H100 and provides huge memory and speed benefits, and has shown to be effective (with limited or no performance drop) in many cases and reports. The kernel will be a fused fp8 matmul kernel with fusion of three parts:

input quantization (dynamic, static) + fp8 matmul (weight dynamic, static quantization inside) + output dequantization

Alternatives

No response

Additional context

No response

qingquansong avatar Aug 24 '24 08:08 qingquansong