quanto Question: any plan to formally support smooth quantization and make it more general

Awesome work!

I noticed there are smooth quant implemented under external. Currently, its implementation seems to be model-specific, we can only apply smooth on special Linear. However, in general, the smooth can be applied on any Linear by inserting a mul. Are there any plans to officially support smooth quantization in-tree? My initial thought was, is it possible to define a SmoothTensor and use __torch_dispatch__ to override the bmm behavior?

Apr 11 '24 02:04 yiliu30

I noticed there are smooth quant implemented under external. Currently, its implementation seems to be model-specific, we can only apply smooth on special Linear.

This is a copy/paste from the smoothquant repo that I quickly hacked fro my tests. Feel free to improve it if you want to make it model agnostic, I'd be happy to merge it !

Are there any plans to officially support smooth quantization in-tree

I am not sure exactly what you mean by that, but I have indeed in mind to dynamically smooth the activations by projecting the smoothing factors channel-wise on the weights of the Linear during the calibration phase. The activations would then be multiplied by the smoothing factors channel-wise before being quantized with the scalar scale.

Apr 11 '24 06:04 dacorvo

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

May 12 '24 01:05 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

May 18 '24 01:05 github-actions[bot]