chore: Refactor disaggregated serving scripts
To simplify disaggregated serving deployment and reduce duplicated code, the disaggregated workers and server can now be launched with:
python3 ${EXAMPLE_DIR}/launch_disaggregated_workers.py -c ${CONFIG_FILE}
trtllm-serve disaggregated -c ${CONFIG_FILE}
respectively, instead of
mpirun --allow-run-as-root -n ${NUM_RANKS} python3 ${EXAMPLE_DIR}/launch_disaggregated_workers.py -c ${CONFIG_FILE}
python3 ${EXAMPLE_DIR}/launch_disaggregated_server.py -c ${CONFIG_FILE}
The number of mpiranks can be automatically determined from the config file hence there's no need for the user to calculate the total number of MPI ranks.
Also, there was some duplicated code between launch_disaggregated_server.py and trtllm-serve. So now the disaggregated server can be launched with trtllm-serve disaggregated.
/bot run
PR_Github #464 [ run ] triggered by Bot
/bot run --disable-fail-fast
PR_Github #468 [ run ] triggered by Bot
PR_Github #464 [ run ] completed with state ABORTED
/LLM/main/L0_MergeRequest_PR pipeline #397 completed with status: 'FAILURE'
PR_Github #468 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #401 completed with status: 'FAILURE'
/bot run
PR_Github #488 [ run ] triggered by Bot
PR_Github #488 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #420 completed with status: 'FAILURE'
/bot run --disable-fail-fast
PR_Github #593 [ run ] triggered by Bot
PR_Github #593 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #503 completed with status: 'SUCCESS'
/bot help
how to run disagg with slurm once the PR has merged?
/bot run --only-multi-gpu-test
PR_Github #668 [ run ] triggered by Bot
/bot run multi-gpu-test
PR_Github #670 Bot args parsing error!
PR_Github #668 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #562 (Partly Tested) completed with status: 'FAILURE'
/bot run --disable-fail-fast
PR_Github #794 [ run ] triggered by Bot
PR_Github #794 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #642 completed with status: 'FAILURE'
/bot run --disable-fail-fast
PR_Github #922 [ run ] triggered by Bot
PR_Github #922 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #728 completed with status: 'SUCCESS'
/bot run --multi-gpu-test
/bot run --add-multi-gpu-test
PR_Github #1023 [ run ] triggered by Bot
PR_Github #1024 [ run ] triggered by Bot
PR_Github #1023 [ run ] completed with state ABORTED