Is Medusa(-2) compatible with vision language models (VLMs) ?

Open MoritzLaurer opened this issue 1 year ago • 0 comments

The repo contains code and examples for tuning medusa heads for text-only LLMs. Is the code for Medusa(-2) directly compatible with VLMs as well? I assume that Medusa should be compatible with VLMs because they do standard next-token-prediction like text-only LLMs, but I wonder how many code changes would be necessary to tune a VLM with Medusa.

Dec 03 '24 09:12 MoritzLaurer