Medusa [New feature] mlc-llm support

https://github.com/mlc-ai/mlc-llm https://github.com/mlc-ai/llm-perf-bench

Sep 18 '23 15:09 ctlllll

This issue (and repos) feels pretty dead. What's happening? Are the maintainers working on something that obsoletes Medusa (https://github.com/FasterDecoding/REST)? Is the roadmap still active? @ctlllll @leeyeehoo

Dec 07 '23 06:12 kmn1024

Indeed we are working on Medusa... will release a new version soon :)

Dec 07 '23 06:12 leeyeehoo

Thanks Yuhong =) Looking forwards!!

Dec 07 '23 06:12 kmn1024

I want to ask for some advice regarding model performance. My goal is to run a custom model on pretty cheap, OpenCL-compatible, hardware. Using MLC, the current speed is ~ 3 toks/sec, which is insufficient for fluid interaction.

What would you recommend? Getting Medusa to work on MLC would help a lot, and I would love to try working on it (though it looks pretty daunting). However, if Medusa v2 is coming out, perhaps I should just wait?

Dec 07 '23 09:12 kmn1024

You can refer to this branch. No significant change on the model side. We are working on fully finetuning the models (including new models like Zephyr) and will release the finetuning recipe. Supporting more libraries will be the next step.

Dec 07 '23 11:12 leeyeehoo

Thanks for the heads up! If you have a chance, please also include a recipe for adding new types of models too.

Dec 07 '23 14:12 kmn1024

Check this Not sure if you talked about a template to add new model support?

Dec 07 '23 17:12 leeyeehoo

Yes! Thanks for pointing that out.

Dec 08 '23 01:12 kmn1024