Question: any plan to formally support smooth quantization and make it more general
Awesome work!
I noticed there are smooth quant implemented under external. Currently, its implementation seems to be model-specific, we can only apply smooth on special Linear.
However, in general, the smooth can be applied on any Linear by inserting a mul. Are there any plans to officially support smooth quantization in-tree? My initial thought was, is it possible to define a SmoothTensor and use __torch_dispatch__ to override the bmm behavior?
I noticed there are smooth quant implemented under external. Currently, its implementation seems to be model-specific, we can only apply smooth on special Linear.
This is a copy/paste from the smoothquant repo that I quickly hacked fro my tests. Feel free to improve it if you want to make it model agnostic, I'd be happy to merge it !
Are there any plans to officially support smooth quantization in-tree
I am not sure exactly what you mean by that, but I have indeed in mind to dynamically smooth the activations by projecting the smoothing factors channel-wise on the weights of the Linear during the calibration phase. The activations would then be multiplied by the smoothing factors channel-wise before being quantized with the scalar scale.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.