TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

test(perf): Add some `Llama-3_3-Nemotron-Super-49B-v1` integration-perf-tests (TRT flow, trtllm-bench)

Open venkywonka opened this issue 9 months ago • 33 comments

Description

  • Add some Llama-3_3-Nemotron-Super-49B-v1 integration-perf-tests (cpp backend, trtllm-bench).
  • This also exposes a --trust_remote_code flag in the trtllm-bench-build subcommand, that is required for transformers library to use Autoclasses to load DeciLM-based models (Llama-Nemotron-Super being one of them).
  • This PR also changes config.py and model.py for the DeciLMForCausalLM classes to have trust_remote_code=True by default (it was False by default previously) for thing to work smoothly without extra parametrizations when run from top-level trtllm-bench.

Performance Summary – llama_v3.3_nemotron_super_49b

isl osl quant con backend req/s tps /gpu avg latency ms p50 latency ms
5000 500 none 1 cpp 0.1075 13.4317 9 306.1785 9 305.0552
5000 500 fp8 1 cpp 0.1485 18.5636 6 733.4385 6 730.6310
5000 500 none 250 cpp 0.6116 76.4499 317 885.8769 401 171.5739
5000 500 fp8 250 cpp 0.7220 90.2495 269 376.7776 340 154.1910
500 2000 none 1 cpp 0.0304 15.2075 32 878.3526 32 877.1050
500 2000 fp8 1 cpp 0.0435 21.7563 22 981.6188 22 975.2227
500 2000 none 250 cpp 0.3274 163.7098 589 062.8547 733 682.4485
500 2000 fp8 250 cpp 0.4158 207.8830 463 903.2804 577 812.6816

Run Invariants

  • Model: llama_v3.3_nemotron_super_49b
  • Backend: cpp (builds TensorRT engines)
  • Precision: BF16 baseline, FP8 quantized variants
  • Max batch size: 16  •  GPUs: 4 (per-GPU throughput shown above)
  • Benchmark tool: trtllm-bench
  • Synthetic dataset: 512 sequences per run

Execution Status Matrix

backend isl osl quant con status
cpp 5000 500 none 1 TIMEOUT
cpp 5000 500 fp8 1 TIMEOUT
cpp 5000 500 none 250 PASS
cpp 5000 500 fp8 250 PASS
cpp 500 2000 none 1 TIMEOUT
cpp 500 2000 fp8 1 TIMEOUT
cpp 500 2000 none 250 PASS
cpp 500 2000 fp8 250 PASS

venkywonka avatar May 07 '25 16:05 venkywonka

/bot run --disable-fail-fast

venkywonka avatar May 07 '25 16:05 venkywonka

PR_Github #4410 [ run ] triggered by Bot

tensorrt-cicd avatar May 07 '25 16:05 tensorrt-cicd

/bot run --disable-fail-fast

venkywonka avatar May 07 '25 19:05 venkywonka

PR_Github #4421 [ run ] triggered by Bot

tensorrt-cicd avatar May 07 '25 19:05 tensorrt-cicd

PR_Github #4410 [ run ] completed with state ABORTED

tensorrt-cicd avatar May 07 '25 19:05 tensorrt-cicd

PR_Github #4421 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3183 completed with status: 'FAILURE'

tensorrt-cicd avatar May 07 '25 22:05 tensorrt-cicd

/bot run --disable-fail-fast

venkywonka avatar May 08 '25 04:05 venkywonka

PR_Github #4472 [ run ] triggered by Bot

tensorrt-cicd avatar May 08 '25 04:05 tensorrt-cicd

PR_Github #4472 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3209 completed with status: 'FAILURE'

tensorrt-cicd avatar May 08 '25 10:05 tensorrt-cicd

/bot run --disable-fail-fast

venkywonka avatar May 08 '25 14:05 venkywonka

PR_Github #4584 [ run ] triggered by Bot

tensorrt-cicd avatar May 08 '25 14:05 tensorrt-cicd

PR_Github #4584 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3298 completed with status: 'SUCCESS'

tensorrt-cicd avatar May 08 '25 21:05 tensorrt-cicd

/bot run --disable-fail-fast

venkywonka avatar May 09 '25 22:05 venkywonka

PR_Github #4737 [ run ] triggered by Bot

tensorrt-cicd avatar May 09 '25 22:05 tensorrt-cicd

PR_Github #4737 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3419 completed with status: 'FAILURE'

tensorrt-cicd avatar May 10 '25 02:05 tensorrt-cicd

/bot run --disable-fail-fast

venkywonka avatar May 12 '25 20:05 venkywonka

PR_Github #4903 [ run ] triggered by Bot

tensorrt-cicd avatar May 12 '25 20:05 tensorrt-cicd

PR_Github #4903 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3555 completed with status: 'SUCCESS'

tensorrt-cicd avatar May 13 '25 01:05 tensorrt-cicd

/bot run --disable-fail-fast

venkywonka avatar May 13 '25 12:05 venkywonka

PR_Github #5014 [ run ] triggered by Bot

tensorrt-cicd avatar May 13 '25 12:05 tensorrt-cicd

PR_Github #5014 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3647 completed with status: 'FAILURE'

tensorrt-cicd avatar May 13 '25 20:05 tensorrt-cicd

/bot run --disable-fail-fast

venkywonka avatar May 14 '25 04:05 venkywonka

PR_Github #5109 [ run ] triggered by Bot

tensorrt-cicd avatar May 14 '25 04:05 tensorrt-cicd

PR_Github #5109 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3721 completed with status: 'FAILURE'

tensorrt-cicd avatar May 14 '25 09:05 tensorrt-cicd

/bot run --disable-fail-fast

venkywonka avatar May 14 '25 22:05 venkywonka

PR_Github #5216 [ run ] triggered by Bot

tensorrt-cicd avatar May 14 '25 22:05 tensorrt-cicd

PR_Github #5216 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3808 completed with status: 'SUCCESS'

tensorrt-cicd avatar May 15 '25 02:05 tensorrt-cicd

/bot run --disable-fail-fast

venkywonka avatar May 16 '25 19:05 venkywonka

PR_Github #5534 [ run ] triggered by Bot

tensorrt-cicd avatar May 16 '25 19:05 tensorrt-cicd

PR_Github #5534 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #4035 completed with status: 'FAILURE'

tensorrt-cicd avatar May 16 '25 22:05 tensorrt-cicd