Medusa How to use the finetuned mistal model for inference with Medusa

How to use the finetuned mistal model for inference with Medusa

Jan 24 '24 16:01 pradeepdev-1995

As an example, you can refer to the Zephyr model (python -m medusa.inference.cli --model FasterDecoding/medusa-1.0-zephyr-7b-beta) :)

Jan 25 '24 02:01 ctlllll

@ctlllll It seems that , this command expectes a medusa model

python -m medusa.inference.cli --model [path of medusa model]

But in my case , i am using the mistral model-https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2 (not based on medusa). so shall i use medusa library to improve my mistral models inference time?

Jan 25 '24 02:01 pradeepdev-1995

You will need to train the hugging face model on the medusa heads before you can use it for inference.

Feb 02 '24 15:02 eldhosemjoy

@eldhosemjoy How to train the hugging face model on the Medusa heads?can you share the reference

Feb 02 '24 15:02 pradeepdev-1995

You can use this script - https://github.com/FasterDecoding/Medusa/blob/main/medusa/train/train_legacy.py This is a Llama example I suppose. you can try with Mistral.

Feb 02 '24 15:02 eldhosemjoy

Is there no way to inference without training? I didn't have the computing resources to train, so I wanted to infer without training.

Feb 05 '24 09:02 MoOo2mini

Try this model which is ported to medusa. https://huggingface.co/text-generation-inference/Mistral-7B-Instruct-v0.2-medusa/tree/main

May 11 '24 17:05 gangooteli