onnxruntime
onnxruntime copied to clipboard
[webgpu-native] opt matmulnbits
Description
Motivation and Context
All native cases can pass now. @guschmue @fs-eire Please take a look, thanks.
also tested and did some perf comparison - yeah, it's good on Xe: Phi3 on tlk token/sec went up from 8.5 -> 13.1