How to use the finetuned mistal model for inference with Medusa
How to use the finetuned mistal model for inference with Medusa
As an example, you can refer to the Zephyr model (python -m medusa.inference.cli --model FasterDecoding/medusa-1.0-zephyr-7b-beta) :)
@ctlllll It seems that , this command expectes a medusa model
python -m medusa.inference.cli --model [path of medusa model]
But in my case , i am using the mistral model-https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2 (not based on medusa). so shall i use medusa library to improve my mistral models inference time?
You will need to train the hugging face model on the medusa heads before you can use it for inference.
@eldhosemjoy How to train the hugging face model on the Medusa heads?can you share the reference
You can use this script - https://github.com/FasterDecoding/Medusa/blob/main/medusa/train/train_legacy.py This is a Llama example I suppose. you can try with Mistral.
Is there no way to inference without training? I didn't have the computing resources to train, so I wanted to infer without training.
Try this model which is ported to medusa. https://huggingface.co/text-generation-inference/Mistral-7B-Instruct-v0.2-medusa/tree/main