Medusa
Medusa copied to clipboard
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Thanks for the wonderful work. I am trying to improve the performance with medusa2. But when I start the training of stage 2 based on the model from stage 1,...
Is there a plan to write script to calculate the PPL (perplexity) of the Medusa model?
As per title.
Currently training script's data loaders only supports chat based data, not instruct, I've made changes to my local to have this done properly can't seem to be able to open...
Hello authors, While reading your code, I noticed that the multiple Medusa Heads you proposed are computing results in parallel ``` for i in range(self.medusa): medusa_logits.append(self.medusa_head[i](hidden_states)) ``` (although the later...
I followed the training steps to train the llama2 model, but encountered the following error. I searched a lot, but still couldn't solve it. ``` UndefinedError File "/home/hs/anaconda3/envs/onebit/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 678,...
Hi, based on https://github.com/FasterDecoding/Medusa/blob/main/notebooks/medusa_introduction.ipynb, "FasterDecoding/medusa-vicuna-7b-v1.3" should have 4 medusa_num_heads. However, in huggingface, it only has 2, https://huggingface.co/FasterDecoding/medusa-vicuna-7b-v1.3/blob/main/config.json. Do you have any plan to share the trained medusa-heads in huggingface?
Hello. After fine-tuning the Medusa head, I discovered an issue affecting inference performance and would like to share my findings. Normally, when a model is trained correctly, using TGI to...
This seems a bug, also [reported by @xiezipeng-ML](https://github.com/FasterDecoding/Medusa/issues/101). Please review.