[New feature] mlc-llm support
https://github.com/mlc-ai/mlc-llm https://github.com/mlc-ai/llm-perf-bench
This issue (and repos) feels pretty dead. What's happening? Are the maintainers working on something that obsoletes Medusa (https://github.com/FasterDecoding/REST)? Is the roadmap still active? @ctlllll @leeyeehoo
Indeed we are working on Medusa... will release a new version soon :)
Thanks Yuhong =) Looking forwards!!
I want to ask for some advice regarding model performance. My goal is to run a custom model on pretty cheap, OpenCL-compatible, hardware. Using MLC, the current speed is ~ 3 toks/sec, which is insufficient for fluid interaction.
What would you recommend? Getting Medusa to work on MLC would help a lot, and I would love to try working on it (though it looks pretty daunting). However, if Medusa v2 is coming out, perhaps I should just wait?
You can refer to this branch. No significant change on the model side. We are working on fully finetuning the models (including new models like Zephyr) and will release the finetuning recipe. Supporting more libraries will be the next step.
Thanks for the heads up! If you have a chance, please also include a recipe for adding new types of models too.
Check this Not sure if you talked about a template to add new model support?
Yes! Thanks for pointing that out.