oneDNN
oneDNN copied to clipboard
rfcs: add external precomputed zero points for gemm
This is a proposal for user-supplied precomputed zero points, primarily intended for boosting GEMM performance in LLMs.
@dzarukin, @mgouicem, @vpirogov, please review.
Related JIRAs: MFDNN-12757, MFDNN-13500
Looks good to me. Just in case, please note that A is also grouped and group_size(A) may be different from group_size(B).
We'll need to address few things before this RFC gets approved:
- Demonstrate performance benefits of the proposed approach in a PoC with OpenVINO.
- Address considerations shared in https://github.com/uxlfoundation/oneDNN/pull/3222#issuecomment-3008185206 In the end the goal of this process is to make sure we have API that will be stable.
Before this happens we have several options to get this feature into OpenVINO's hands.