GRYS

Results 2 issues of GRYS

### Details The performance of the Zdot function is too low compared to other vector operations (axpy, vecmul). According to perf_math_kernel tests, several blas functions have the following performance results:...

Performance

In AWQ inference, the quantized weight matrix is dequantized to fp16 and then multiplied by the input matrix `x` in the linear layer. But I try to directly replace the...