Yuxuan Hu comments

Repositories
Issues
Comments

Results 3 comments of


                                            Yuxuan Hu

High PPL when groupsize != -1 for OPT model after replace linear layer with quantlinear.

> I tried to test GPTQ's PPL metrics on the opt model via opt.py. The PPL metrics of the opt model are normal with the use of fake quantization. However,...

[QST] How to implement a fused mixed precision matrix multiplication such as w4a4 + w16a16?

Thank you very much for your reply! The input consists of two activations X1[L, D1], X2[L, D2] and two weight matrices W1[D, D1], W2[D, D2], where $L = 2048, D_1...

Accelerating custom sparsity patterns on GPU with triton

I have the same question. Meanwhile, if the sparse tensor core is not supported now, can we implement a load in sparse and compute in dense kernel based on triton?