Medusa issues

Replaced broken TGI link

Changed the TGI link pointing to medusa that is broken

Is Medusa(-2) compatible with vision language models (VLMs) ?

The repo contains code and examples for tuning medusa heads for text-only LLMs. Is the code for Medusa(-2) directly compatible with VLMs as well? I assume that Medusa should be...

MoritzLaurer

Question about the Tree Attention Mechanism

Suppose the first MEDUSA head generates the top-2 predictions "It is" and "It's", while the second MEDUSA head generates the top-3 predictions "difficult", "a", and "not". This results in a...

chansonzhang

About Code compatability

``` class MedusaModelABC(nn.Module): """The Medusa Language Model Head. This module creates a series of prediction heads (based on the 'medusa' parameter) on top of a given base model. Each head...

kimjoohyungsd

Ask for data recipe to reproduce Medusa-2

In the `README.md`, you mentioned that > The data preparation code for self-distillation can be found in [data_generation folder](https://github.com/FasterDecoding/Medusa/blob/main/data_generation) of the current repo. In that folder, it says > `python...

Achazwl

The legacy Medusa Head structure is inconsistent with the new one.

1

In medusa_model_legacy.py, the implementation is that the Medusa head is only responsible for generating new hidden states, and the generation of medusa logits still reuses the base_model's lm_head. Here is...

Jianhua-Cui

loss value nan

HI when I was training with vicuna v1.3, the loss is always nan, my training script is this `torchrun --nproc_per_node=1 medusa/train/train_legacy.py --model_name_or_path lmsys/vicuna-7b-v1.3 \ --data_path mistral.json \ --bf16 True \...

wittycheng

why什么使用连续提前开辟好的KVcache? 这样本身就引入一个比huggingface实现快的因素？

谢谢

wenhaoli-xmu

Medusa
Medusa copied to clipboard

Metadata

Replaced broken TGI link

Is Medusa(-2) compatible with vision language models (VLMs) ?

Question about the Tree Attention Mechanism

About Code compatability

Ask for data recipe to reproduce Medusa-2

The legacy Medusa Head structure is inconsistent with the new one.

loss value nan

why什么使用连续提前开辟好的KVcache? 这样本身就引入一个比huggingface实现快的因素？

← Metadata

Owner

Metadata

Medusa Medusa copied to clipboard

Metadata

← Metadata

Owner

Metadata

Medusa
Medusa copied to clipboard