The legacy Medusa Head structure is inconsistent with the new one.

Open Jianhua-Cui opened this issue 3 months ago • 1 comments

In medusa_model_legacy.py, the implementation is that the Medusa head is only responsible for generating new hidden states, and the generation of medusa logits still reuses the base_model's lm_head.

Here is the code: https://github.com/FasterDecoding/Medusa/blob/e2a5d20c048a9b0a4092e6933c34313687422518/medusa/model/medusa_model_legacy.py#L203-L206

However, in the new medusa_model.py or medusa_model_new.py, this has changed such that each Medusa head has its own "lm_head" (a Linear layer with in_features = hidden_size, out_features = vocab_size), as shown in the code below: https://github.com/FasterDecoding/Medusa/blob/e2a5d20c048a9b0a4092e6933c34313687422518/medusa/model/medusa_model.py#L111-L119

Inference code is: https://github.com/FasterDecoding/Medusa/blob/e2a5d20c048a9b0a4092e6933c34313687422518/medusa/model/medusa_model.py#L215-L218

This is very confusing, especially since the README.md provides both legacy and new training methods. Which of these truly reflects the performance reported in the paper?

Thank you very much for your work, looking forward to your reply or anyone's discussion.

Oct 11 '25 07:10 Jianhua-Cui

@leeyeehoo @ctlllll @Narsil Hello everyone, sorry to bother you. Since this repo seems to have not had anyone maintaining issues for a long time, could the core contributors please answer these questions?

Oct 11 '25 07:10 Jianhua-Cui