trl icon indicating copy to clipboard operation
trl copied to clipboard

[DRAFT] Vllm integration

Open vwxyzjn opened this issue 1 year ago • 4 comments

-- UPDATE 7/7/2024: after chatting with @lewtun, we'd like to see if vLLM is willing to support https://github.com/vllm-project/vllm/issues/6189 officially before merging this PR as it may cause confusion for the users.

This PR adds a vLLM backend for generation purposes. Preliminary testing shows it's ~8x faster. Given 80 mins of training, the one with HF generation proceeded for 2650 episodes, whereas the one with vLLM generation proceeded for 16k episodes.

image

Note that your milage might vary with different hardware / generation length. For example, in TL;DR vllm 1B models vLLM does not seem to provide much speed benefits, likely due to short generation length.

Note that we have to use our custom vLLM build to achieve precise device placement (so that we can place the vLLM instance on the 8th GPU). See vwxyzjn/vllm#1

vwxyzjn avatar May 07 '24 18:05 vwxyzjn

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

I'm really looking forward to this integration! Just out of curiosity, do you think using optimum or torch.compile as a generation backend is possible? @vwxyzjn

juyoungml avatar Jul 19 '24 05:07 juyoungml

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar Aug 12 '24 15:08 github-actions[bot]

I'm really looking forward to this integration! Just out of curiosity, do you think using optimum or torch.compile as a generation backend is possible? @vwxyzjn

Yes I think torch.compile would be an option, but with the caveat that currently only a few model architectures are supported.

lewtun avatar Aug 13 '24 07:08 lewtun

Hi, is there any updates? Thanks!

fzyzcjy avatar Dec 18 '24 12:12 fzyzcjy