[Feature]: Consider integrating SVDquant (W4A4 quantization) from Nunchaku project

Open dengyingxu opened this issue 1 year ago • 0 comments

🚀 The feature, motivation and pitch

I noticed the Nunchaku project (https://github.com/mit-han-lab/nunchaku) has implemented SVDquant, which seems highly compatible with LLM scenarios, particularly their W4A4 quantization approach. This looks very interesting and promising for model optimization.

Would Aphrodite Engine consider supporting or integrating this quantization method? It could potentially offer significant benefits for memory efficiency while maintaining model performance in LLM serving scenarios.

The Nunchaku project’s implementation appears to be well-suited for LLM use cases, and integration could be valuable for the Aphrodite Engine community.

Alternatives

No response

Additional context

No response

Jan 24 '25 09:01 dengyingxu