DeepResearch Hardcoding amount of GPUs run_react

This block of code forces having a setup with 8 GPUs and that GPU needs enough VRAM to host a single instance of the model:

CUDA_VISIBLE_DEVICES=0 vllm serve $MODEL_PATH --host 0.0.0.0 --port 6001 --disable-log-requests &
CUDA_VISIBLE_DEVICES=1 vllm serve $MODEL_PATH --host 0.0.0.0 --port 6002 --disable-log-requests &
CUDA_VISIBLE_DEVICES=2 vllm serve $MODEL_PATH --host 0.0.0.0 --port 6003 --disable-log-requests &
CUDA_VISIBLE_DEVICES=3 vllm serve $MODEL_PATH --host 0.0.0.0 --port 6004 --disable-log-requests &
CUDA_VISIBLE_DEVICES=4 vllm serve $MODEL_PATH --host 0.0.0.0 --port 6005 --disable-log-requests &
CUDA_VISIBLE_DEVICES=5 vllm serve $MODEL_PATH --host 0.0.0.0 --port 6006 --disable-log-requests &
CUDA_VISIBLE_DEVICES=6 vllm serve $MODEL_PATH --host 0.0.0.0 --port 6007 --disable-log-requests &
CUDA_VISIBLE_DEVICES=7 vllm serve $MODEL_PATH --host 0.0.0.0 --port 6008 --disable-log-requests &

This should be refactored to a more simple and configurable setup pending on the environment of the user.

Sep 18 '25 04:09 tobrun

Hi,i want to ask: it forces u to use 8 gpus?if i delete one line,it cannot run successfully?

Sep 18 '25 08:09 YiJunSachs

I can run it perfectly fine with a 4xRTX-6000-ADA with using:

# Server 1 uses GPU 0 and 1 together
CUDA_VISIBLE_DEVICES=0,1 vllm serve $MODEL_PATH --host 0.0.0.0 --port 6001 --tensor-parallel-size 2 --disable-log-requests &

# Server 2 uses GPU 2 and 3 together
CUDA_VISIBLE_DEVICES=2,3 vllm serve $MODEL_PATH --host 0.0.0.0 --port 6002 --tensor-parallel-size 2 --disable-log-requests &

main_ports=(6001 6002)

It's more about the usability of how to configure the system. I would love to resolve this issue together with https://github.com/Alibaba-NLP/DeepResearch/issues/118, lmk what you think about that issue.

Sep 18 '25 17:09 tobrun

I can run it perfectly fine with a 4xRTX-6000-ADA with using:
# Server 1 uses GPU 0 and 1 together
CUDA_VISIBLE_DEVICES=0,1 vllm serve $MODEL_PATH --host 0.0.0.0 --port 6001 --tensor-parallel-size 2 --disable-log-requests &

# Server 2 uses GPU 2 and 3 together
CUDA_VISIBLE_DEVICES=2,3 vllm serve $MODEL_PATH --host 0.0.0.0 --port 6002 --tensor-parallel-size 2 --disable-log-requests &

main_ports=(6001 6002)
It's more about the usability of how to configure the system. I would love to resolve this issue together with #118, lmk what you think about that issue.

Great configuration, I think all modification should be done at the port both in sh file and py file, config the port correctly and everything will be fine.

Sep 26 '25 09:09 zhaowenZhou

Hardcoding amount of GPUs run_react_infer.sh