Medusa issues

Encounter an CUDA error when set Medusa head

hi, @ctlllll I try to use medusa on llama model，and do some medusa head experiments. when base_model_config. medusa_num_heads in from_pretrained(medusa_model.py) is set to be 2 or 3， an error will...

1649759610

deepspeed support

Hello, I want to finetune llama2 70B medusa head. But for A100-80G, if I do not want use quantized model, it can not fit the model in a single A100....

jiangix-paper

python gen_model_answer_baseline.py --model-path /data/transformers/vicuna-7b-v1.3 --model-id vicuna-7b-v1.3-0 python gen_model_answer_medusa.py --model-path /data/transformers/medusa_vicuna-7b-v1.3 --model-id medusa-vicuna-7b-v1.3-0 My vicuna-7b-v1.3 download comes from:https://huggingface.co/FasterDecoding/medusa-vicuna-7b-v1.3/tree/main My medusa-vicuna-7b-v1.3 download comes from:https://huggingface.co/FasterDecoding/medusa-vicuna-7b-v1.3/tree/main I used this command to add the local...

qspang

Roadmap

15

# Roadmap ## Functionality - [x] #36 - [x] #39 - [ ] Distill from any model without access to the original training data - [ ] Batched inference -...

ctlllll

documentation

Question about Heads warmup

1

Hi, I'm not an expert, so this might be a stupid question, but I have a question about the Heads warmup part of the Medusa paper. In that part it...

eloooooon

vLLM support

12

MichaelJayW

[New feature] mlc-llm support

8

https://github.com/mlc-ai/mlc-llm https://github.com/mlc-ai/llm-perf-bench

ctlllll

enhancement

[New feature] llama.cpp support

7

We are currently running out of bandwidth. Contributors to help integrate Medusa into llama.cpp would be greatly appreciated :)

ctlllll

enhancement

FasterTransformer support

1

Can Medusa use FasterTransformer in the furture?

niyunsheng

[Feature Request] Qwen model support

1

Qwen 7B/14B model looks strong, I understand we don't have access to their dataset, but still extremely useful to havea medusa finetuned with smaller Chinese/English dataset.

JianbangZ

Medusa
Medusa copied to clipboard

Metadata

Encounter an CUDA error when set Medusa head

deepspeed support

OSError

Roadmap

Question about Heads warmup

vLLM support

[New feature] mlc-llm support

[New feature] llama.cpp support

FasterTransformer support

[Feature Request] Qwen model support

← Metadata

Owner

Metadata

Medusa Medusa copied to clipboard

Metadata

← Metadata

Owner

Metadata

Medusa
Medusa copied to clipboard