Medusa
Medusa copied to clipboard
Support batch size > 1
Support BatchSize > 1
This PR suppose to support batch size > 1 for Medusa inference model. This is only a draft for now and need further improvement.
Main change
Update update_inference_inputs tree_decoding generate_candidates to support bs>1
using [PAD] to fill the inputs, such as:
prompt:
A B C [PAD] [PAD]
D E F H I
plan to squeeze the [PAD] in the middle of inputs when add new tokens to the sequence, such as:
prompt: new_tokens:
A B C [PAD] [PAD] X Y
D E F H I Z [PAD]
new sequence:
A B C X Y [PAD]
D E F H I Z
Test:
python -m medusa.inference.inference_test --model 'FasterDecoding/medusa-vicuna-7b-v1.3'
This PR is marked as draft as there is more work required to get this into a mergable state.