Medusa icon indicating copy to clipboard operation
Medusa copied to clipboard

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Results 58 Medusa issues
Sort by recently updated
recently updated
newest added

hi, @ctlllll I try to use medusa on llama model,and do some medusa head experiments. when base_model_config. medusa_num_heads in from_pretrained(medusa_model.py) is set to be 2 or 3, an error will...

Hello, I want to finetune llama2 70B medusa head. But for A100-80G, if I do not want use quantized model, it can not fit the model in a single A100....

python gen_model_answer_baseline.py --model-path /data/transformers/vicuna-7b-v1.3 --model-id vicuna-7b-v1.3-0 python gen_model_answer_medusa.py --model-path /data/transformers/medusa_vicuna-7b-v1.3 --model-id medusa-vicuna-7b-v1.3-0 My vicuna-7b-v1.3 download comes from:https://huggingface.co/FasterDecoding/medusa-vicuna-7b-v1.3/tree/main My medusa-vicuna-7b-v1.3 download comes from:https://huggingface.co/FasterDecoding/medusa-vicuna-7b-v1.3/tree/main I used this command to add the local...

# Roadmap ## Functionality - [x] #36 - [x] #39 - [ ] Distill from any model without access to the original training data - [ ] Batched inference -...

documentation

Hi, I'm not an expert, so this might be a stupid question, but I have a question about the Heads warmup part of the Medusa paper. In that part it...

https://github.com/mlc-ai/mlc-llm https://github.com/mlc-ai/llm-perf-bench

enhancement

We are currently running out of bandwidth. Contributors to help integrate Medusa into llama.cpp would be greatly appreciated :)

enhancement

Can Medusa use FasterTransformer in the furture?

Qwen 7B/14B model looks strong, I understand we don't have access to their dataset, but still extremely useful to havea medusa finetuned with smaller Chinese/English dataset.