Medusa issues

Does Medusa support beam search decoding strategy?

Thanks for the wonderful work. From what it looks like, you are using greedy decoding strategy. I would like to know if Medusa tree-attention supports beam search decoding strategy.

xs229

[report bug] Encountered when inferencing with Mistral models

I met the following error when inferencing with base_model `Mistral-7b-Instruct-v0.2` File "~/Medusa/medusa/model/modeling_mistral_kv.py", line 74, in _make_sliding_window_causal_mask mask = torch.triu(mask, diagonal=-sliding_window) ^^^^^^^^^^^^^^^ TypeError: bad operand type for unary -: 'NoneType' I...

shrango

How to run inference of a baseline model without medusa support?

Hello! I'd like to learn how to run an inference with a baseline model of Vicuna without Medusa support. Additionally, I’m curious if there has been any analysis done on...

kailashg26

About the Tree Sparsity

I noticed that the Medusa model uses a pre-defined parse tree structure for inference, and different models have their corresponding tree structures. What is the detailed procedure for defining this...

PineTreeWss

is_flash_attn_available has been renamed in transformers.utils

Updated is_flash_attn_available to is_flash_attn_2_available transformer.utils - https://github.com/huggingface/transformers/blob/main/src/transformers/utils/__init__.py PR - https://github.com/huggingface/transformers/pull/26785

simrathanspal

Training code is not working

2

When I am trying to use the example given to train the vicuna 7B model on colab. I am getting the following error. ``` 2024-08-25 17:30:20.138324: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to...

ksajan

[Retraining] Use Liger Kernel to avoid multi-head logits materialization and scale the context length by N times

1

https://github.com/linkedin/Liger-Kernel/tree/main/examples/medusa With the implementation of FusedLinearCrossEntropy and other kernels in Liger-Kernel, we are able to effectively reduce the memory while increase the throughput. We are happy to collaborate and integrate...

ByronHsu

why my train medusa head result is {'medusa0_top1': nan, 'medusa0_loss': nan, 'medusa1_top1': nan, 'medusa1_loss': nan, 'medusa2_top1': nan, 'medusa2_loss': nan, 'epoch': 0}

1

I'm now preparing to train the medusa header to the readme file and first ran into the following issue： /data/lx/demo/Medusa/medusa/train/train_legacy.py:392: FutureWarning: `tokenizer` is deprecated and will be removed in version...

Mewo518

Fix sharing of resblock layers (from Liger-Kernel#269)

When using multiple residual block in medusa MLP heads, parameters are wrongly shared. This was already reported in Hydra and already fixed in the Liger-Kernel repository https://github.com/zankner/Hydra/issues/8 https://github.com/linkedin/Liger-Kernel/pull/269

loreloc

Support for other types of LLM

Hey guys! Is there any plan to support other types of LLMs besides Llama and Mistral?

Shubin-vadim

Medusa
Medusa copied to clipboard

Metadata

Does Medusa support beam search decoding strategy?

[report bug] Encountered when inferencing with Mistral models

How to run inference of a baseline model without medusa support?

About the Tree Sparsity

is_flash_attn_available has been renamed in transformers.utils

Training code is not working

[Retraining] Use Liger Kernel to avoid multi-head logits materialization and scale the context length by N times

why my train medusa head result is {'medusa0_top1': nan, 'medusa0_loss': nan, 'medusa1_top1': nan, 'medusa1_loss': nan, 'medusa2_top1': nan, 'medusa2_loss': nan, 'epoch': 0}

Fix sharing of resblock layers (from Liger-Kernel#269)

Support for other types of LLM

← Metadata

Owner

Metadata

Medusa Medusa copied to clipboard

Metadata

← Metadata

Owner

Metadata

Medusa
Medusa copied to clipboard