test(perf): Add some `Llama-3_3-Nemotron-Super-49B-v1` integration-perf-tests (TRT flow, trtllm-bench)
Description
- Add some
Llama-3_3-Nemotron-Super-49B-v1integration-perf-tests (cpp backend, trtllm-bench). - This also exposes a
--trust_remote_codeflag in thetrtllm-bench-buildsubcommand, that is required fortransformerslibrary to use Autoclasses to load DeciLM-based models (Llama-Nemotron-Super being one of them). - This PR also changes
config.pyandmodel.pyfor theDeciLMForCausalLMclasses to havetrust_remote_code=Trueby default (it was False by default previously) for thing to work smoothly without extra parametrizations when run from top-level trtllm-bench.
Performance Summary – llama_v3.3_nemotron_super_49b
| isl | osl | quant | con | backend | req/s | tps /gpu | avg latency ms | p50 latency ms |
|---|---|---|---|---|---|---|---|---|
| 5000 | 500 | none | 1 | cpp | 0.1075 | 13.4317 | 9 306.1785 | 9 305.0552 |
| 5000 | 500 | fp8 | 1 | cpp | 0.1485 | 18.5636 | 6 733.4385 | 6 730.6310 |
| 5000 | 500 | none | 250 | cpp | 0.6116 | 76.4499 | 317 885.8769 | 401 171.5739 |
| 5000 | 500 | fp8 | 250 | cpp | 0.7220 | 90.2495 | 269 376.7776 | 340 154.1910 |
| 500 | 2000 | none | 1 | cpp | 0.0304 | 15.2075 | 32 878.3526 | 32 877.1050 |
| 500 | 2000 | fp8 | 1 | cpp | 0.0435 | 21.7563 | 22 981.6188 | 22 975.2227 |
| 500 | 2000 | none | 250 | cpp | 0.3274 | 163.7098 | 589 062.8547 | 733 682.4485 |
| 500 | 2000 | fp8 | 250 | cpp | 0.4158 | 207.8830 | 463 903.2804 | 577 812.6816 |
Run Invariants
-
Model:
llama_v3.3_nemotron_super_49b - Backend: cpp (builds TensorRT engines)
- Precision: BF16 baseline, FP8 quantized variants
- Max batch size: 16 • GPUs: 4 (per-GPU throughput shown above)
-
Benchmark tool:
trtllm-bench - Synthetic dataset: 512 sequences per run
Execution Status Matrix
| backend | isl | osl | quant | con | status |
|---|---|---|---|---|---|
| cpp | 5000 | 500 | none | 1 | TIMEOUT |
| cpp | 5000 | 500 | fp8 | 1 | TIMEOUT |
| cpp | 5000 | 500 | none | 250 | PASS |
| cpp | 5000 | 500 | fp8 | 250 | PASS |
| cpp | 500 | 2000 | none | 1 | TIMEOUT |
| cpp | 500 | 2000 | fp8 | 1 | TIMEOUT |
| cpp | 500 | 2000 | none | 250 | PASS |
| cpp | 500 | 2000 | fp8 | 250 | PASS |
/bot run --disable-fail-fast
PR_Github #4410 [ run ] triggered by Bot
/bot run --disable-fail-fast
PR_Github #4421 [ run ] triggered by Bot
PR_Github #4410 [ run ] completed with state ABORTED
PR_Github #4421 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3183 completed with status: 'FAILURE'
/bot run --disable-fail-fast
PR_Github #4472 [ run ] triggered by Bot
PR_Github #4472 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3209 completed with status: 'FAILURE'
/bot run --disable-fail-fast
PR_Github #4584 [ run ] triggered by Bot
PR_Github #4584 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3298 completed with status: 'SUCCESS'
/bot run --disable-fail-fast
PR_Github #4737 [ run ] triggered by Bot
PR_Github #4737 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3419 completed with status: 'FAILURE'
/bot run --disable-fail-fast
PR_Github #4903 [ run ] triggered by Bot
PR_Github #4903 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3555 completed with status: 'SUCCESS'
/bot run --disable-fail-fast
PR_Github #5014 [ run ] triggered by Bot
PR_Github #5014 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3647 completed with status: 'FAILURE'
/bot run --disable-fail-fast
PR_Github #5109 [ run ] triggered by Bot
PR_Github #5109 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3721 completed with status: 'FAILURE'
/bot run --disable-fail-fast
PR_Github #5216 [ run ] triggered by Bot
PR_Github #5216 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3808 completed with status: 'SUCCESS'
/bot run --disable-fail-fast
PR_Github #5534 [ run ] triggered by Bot
PR_Github #5534 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4035 completed with status: 'FAILURE'