verl
verl copied to clipboard
Support of VLLM Async pipeline parallel
Feature request
Many times, the limited memory of a single card requires some models of VLLM to run as much as possible, although he has lost some training efficiency and expects official support.
#2851 I have successfully started this method, but the latest code has changed a lot.
model: qwen2.5-7B num_attention_heads=28 vocab_size=152064 Only tp4 can be used, but not tp28. Because of the limitation of vocab_size size, vocab_size needs to be divisible by tp quantity in vllm.
Motivation
Single card memory is limited
Your contribution
#2851 I have successfully started this method, but the latest code has changed a lot.