Medusa icon indicating copy to clipboard operation
Medusa copied to clipboard

[New feature] mlc-llm support

Open ctlllll opened this issue 2 years ago • 8 comments

https://github.com/mlc-ai/mlc-llm https://github.com/mlc-ai/llm-perf-bench

ctlllll avatar Sep 18 '23 15:09 ctlllll

This issue (and repos) feels pretty dead. What's happening? Are the maintainers working on something that obsoletes Medusa (https://github.com/FasterDecoding/REST)? Is the roadmap still active? @ctlllll @leeyeehoo

kmn1024 avatar Dec 07 '23 06:12 kmn1024

Indeed we are working on Medusa... will release a new version soon :)

leeyeehoo avatar Dec 07 '23 06:12 leeyeehoo

Thanks Yuhong =) Looking forwards!!

kmn1024 avatar Dec 07 '23 06:12 kmn1024

I want to ask for some advice regarding model performance. My goal is to run a custom model on pretty cheap, OpenCL-compatible, hardware. Using MLC, the current speed is ~ 3 toks/sec, which is insufficient for fluid interaction.

What would you recommend? Getting Medusa to work on MLC would help a lot, and I would love to try working on it (though it looks pretty daunting). However, if Medusa v2 is coming out, perhaps I should just wait?

kmn1024 avatar Dec 07 '23 09:12 kmn1024

You can refer to this branch. No significant change on the model side. We are working on fully finetuning the models (including new models like Zephyr) and will release the finetuning recipe. Supporting more libraries will be the next step.

leeyeehoo avatar Dec 07 '23 11:12 leeyeehoo

Thanks for the heads up! If you have a chance, please also include a recipe for adding new types of models too.

kmn1024 avatar Dec 07 '23 14:12 kmn1024

Check this Not sure if you talked about a template to add new model support?

leeyeehoo avatar Dec 07 '23 17:12 leeyeehoo

Yes! Thanks for pointing that out.

kmn1024 avatar Dec 08 '23 01:12 kmn1024