feat(llama2-70b): Add multinode to SUT_API.py for the offline scenario

Open mrzzy opened this issue 2 months ago • 1 comments

Motivation

The LLaMA-2-70B benchmark (Offline Scenario) currently does not have multinode support.

This PR adds multinode inference support to LLaMA-2-70B benchmark (Offline Scenario) by enabling SUT_API.py to issue requests to multiple OpenAI-compatible endpoints (e.g., vLLM, TensorRT-LLM) simultaneously. Prompts are (mostly) evenly partitioned across servers.

Multi-server API mode for SUT_API (--vllm) with even prompt distribution across multiple OpenAI-compatible endpoints.
Unit tests for API-related logic (query_batch and query_servers).
Documentation updates and example commands for multinode usage.
Additional dependencies specified in READMEs.

User facing Changes

Usage Example (Offline + Multinode API mode)

python3 -u main.py --scenario Offline \
    --vllm \
    --api-model-name ${MODEL_NAME} \
    --api-server http://node1:8000 \
    --api-server http://node2:8000 \
    --api-server http://node3:8000 \
    --model-path ${CHECKPOINT_PATH} \
    --user-conf user.conf \
    --total-sample-count 24576 \
    --dataset-path ${DATASET_PATH} \
    --output-log-dir offline-logs

Each --api-server argument registers an endpoint; SUT_API distributes prompts across them automatically.

Nov 15 '25 03:11 mrzzy

MLCommons CLA bot:
Thank you very much for your submission, we really appreciate it. Before we can accept your contribution, we ask that you sign the MLCommons CLA (Apache 2). Please use this [Google form] (https://forms.gle/Ew1KkBVpyeJDuRw67) to initiate authorization. If you are from an MLCommons member organization, we will request that you be added to the CLA. If you are not from a member organization, we will email you a CLA to sign. For any questions, please contact [email protected].
0 out of 1 committers have signed the MLCommons CLA.
:x: @mrzzy
_{You can retrigger this bot by commenting recheck in this Pull Request}

Nov 15 '25 03:11 github-actions[bot]

feat(llama2-70b): Add multinode to SUT_API.py for the offline scenario

Motivation

Contents

User facing Changes