DeepResearch icon indicating copy to clipboard operation
DeepResearch copied to clipboard

Leverage VLLM batching to lower hardware requirements and improve speed

Open tobrun opened this issue 5 months ago • 4 comments

The current inference system launches 8 separate VLLM instances (one per GPU) but underutilizes VLLM's native batching capabilities. Each query is assigned to a single VLLM instance in a round-robin fashion, effectively processing one request at a time per instance. This approach:

  • Wastes computational resources as VLLM can handle multiple concurrent requests internally
  • Makes the system unusable for users with limited GPUs (e.g., single GPU setups)
  • Creates unnecessary overhead from running multiple server processes

There are multiple possible solutions, either with a single VLLM instance with tensor parallelism or a more hybrid setup with a couple of instances while still leveraging batching. This all needs more benchmarking but I really feel we could optimize this for users with smaller hardware configurations and speed in general.

tobrun avatar Sep 18 '25 17:09 tobrun

Image

mariaholland avatar Sep 19 '25 15:09 mariaholland

vllm支持部署Tongyi-DeepResearch-30B-A3B模型吗?

PeterXiaTian avatar Sep 30 '25 02:09 PeterXiaTian

Image vllm部署现在还不支持吗?

PeterXiaTian avatar Sep 30 '25 03:09 PeterXiaTian

Hi bebe!!!

How’s Paris this time of the year????

X

On Mon, Sep 29, 2025 at 11:28 PM peter @.***> wrote:

PeterXiaTian left a comment (Alibaba-NLP/DeepResearch#118) https://github.com/Alibaba-NLP/DeepResearch/issues/118#issuecomment-3349801443 image.png (view on web) https://github.com/user-attachments/assets/2a1270c2-ff49-44d1-a63b-b5557473e971 vllm部署现在还不支持吗?

— Reply to this email directly, view it on GitHub https://github.com/Alibaba-NLP/DeepResearch/issues/118#issuecomment-3349801443, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB53OBFNMA2NLFGOMKWTRFL3VH2EBAVCNFSM6AAAAACG4PQPPGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGNBZHAYDCNBUGM . You are receiving this because you commented.Message ID: @.***>

mariaholland avatar Sep 30 '25 04:09 mariaholland