Liger-Kernel
Liger-Kernel copied to clipboard
[feat] Int8 Matmul Training kernel
🚀 The feature, motivation and pitch
W8A8 (int8 for both weight and activation) matmul is beneficial for A100 and could provide great memory and speed benefits, and could be effective given dedicated design on mix precision training (avoid convergence issue and mitigate performance drop). The kernel will be a fused int8 matmul kernel with fusion of three parts:
input activation quantization to int8 (dynamic, static) + int8 matmul (weight int8 dynamic, static quantization inside) + output dequantization
Alternatives
No response
Additional context
No response