Medusa icon indicating copy to clipboard operation
Medusa copied to clipboard

How to use the finetuned mistal model for inference with Medusa

Open pradeepdev-1995 opened this issue 1 year ago • 7 comments

How to use the finetuned mistal model for inference with Medusa

pradeepdev-1995 avatar Jan 24 '24 16:01 pradeepdev-1995

As an example, you can refer to the Zephyr model (python -m medusa.inference.cli --model FasterDecoding/medusa-1.0-zephyr-7b-beta) :)

ctlllll avatar Jan 25 '24 02:01 ctlllll

@ctlllll It seems that , this command expectes a medusa model

python -m medusa.inference.cli --model [path of medusa model]

But in my case , i am using the mistral model-https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2 (not based on medusa). so shall i use medusa library to improve my mistral models inference time?

pradeepdev-1995 avatar Jan 25 '24 02:01 pradeepdev-1995

You will need to train the hugging face model on the medusa heads before you can use it for inference.

eldhosemjoy avatar Feb 02 '24 15:02 eldhosemjoy

@eldhosemjoy How to train the hugging face model on the Medusa heads?can you share the reference

pradeepdev-1995 avatar Feb 02 '24 15:02 pradeepdev-1995

You can use this script - https://github.com/FasterDecoding/Medusa/blob/main/medusa/train/train_legacy.py This is a Llama example I suppose. you can try with Mistral.

eldhosemjoy avatar Feb 02 '24 15:02 eldhosemjoy

Is there no way to inference without training? I didn't have the computing resources to train, so I wanted to infer without training.

MoOo2mini avatar Feb 05 '24 09:02 MoOo2mini

Try this model which is ported to medusa. https://huggingface.co/text-generation-inference/Mistral-7B-Instruct-v0.2-medusa/tree/main

gangooteli avatar May 11 '24 17:05 gangooteli