aphrodite-engine
aphrodite-engine copied to clipboard
[Feature]: Consider integrating SVDquant (W4A4 quantization) from Nunchaku project
🚀 The feature, motivation and pitch
I noticed the Nunchaku project (https://github.com/mit-han-lab/nunchaku) has implemented SVDquant, which seems highly compatible with LLM scenarios, particularly their W4A4 quantization approach. This looks very interesting and promising for model optimization.
Would Aphrodite Engine consider supporting or integrating this quantization method? It could potentially offer significant benefits for memory efficiency while maintaining model performance in LLM serving scenarios.
The Nunchaku project’s implementation appears to be well-suited for LLM use cases, and integration could be valuable for the Aphrodite Engine community.
Alternatives
No response
Additional context
No response