[feat] Int8 Matmul Training kernel

Open qingquansong opened this issue 1 year ago • 0 comments

🚀 The feature, motivation and pitch

W8A8 (int8 for both weight and activation) matmul is beneficial for A100 and could provide great memory and speed benefits, and could be effective given dedicated design on mix precision training (avoid convergence issue and mitigate performance drop). The kernel will be a fused int8 matmul kernel with fusion of three parts:

input activation quantization to int8 (dynamic, static) + int8 matmul (weight int8 dynamic, static quantization inside) + output dequantization

Alternatives

No response

Additional context

No response

Aug 24 '24 08:08 qingquansong