Medusa icon indicating copy to clipboard operation
Medusa copied to clipboard

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Results 58 Medusa issues
Sort by recently updated
recently updated
newest added

Thanks for the wonderful work. From what it looks like, you are using greedy decoding strategy. I would like to know if Medusa tree-attention supports beam search decoding strategy.

I met the following error when inferencing with base_model `Mistral-7b-Instruct-v0.2` File "~/Medusa/medusa/model/modeling_mistral_kv.py", line 74, in _make_sliding_window_causal_mask mask = torch.triu(mask, diagonal=-sliding_window) ^^^^^^^^^^^^^^^ TypeError: bad operand type for unary -: 'NoneType' I...

Hello! I'd like to learn how to run an inference with a baseline model of Vicuna without Medusa support. Additionally, I’m curious if there has been any analysis done on...

I noticed that the Medusa model uses a pre-defined parse tree structure for inference, and different models have their corresponding tree structures. What is the detailed procedure for defining this...

Updated is_flash_attn_available to is_flash_attn_2_available transformer.utils - https://github.com/huggingface/transformers/blob/main/src/transformers/utils/__init__.py PR - https://github.com/huggingface/transformers/pull/26785

When I am trying to use the example given to train the vicuna 7B model on colab. I am getting the following error. ``` 2024-08-25 17:30:20.138324: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to...

https://github.com/linkedin/Liger-Kernel/tree/main/examples/medusa With the implementation of FusedLinearCrossEntropy and other kernels in Liger-Kernel, we are able to effectively reduce the memory while increase the throughput. We are happy to collaborate and integrate...

I'm now preparing to train the medusa header to the readme file and first ran into the following issue: /data/lx/demo/Medusa/medusa/train/train_legacy.py:392: FutureWarning: `tokenizer` is deprecated and will be removed in version...

When using multiple residual block in medusa MLP heads, parameters are wrongly shared. This was already reported in Hydra and already fixed in the Liger-Kernel repository https://github.com/zankner/Hydra/issues/8 https://github.com/linkedin/Liger-Kernel/pull/269

Hey guys! Is there any plan to support other types of LLMs besides Llama and Mistral?