Description

Add some Llama-3_3-Nemotron-Super-49B-v1 integration-perf-tests (cpp backend, trtllm-bench).
This also exposes a --trust_remote_code flag in the trtllm-bench-build subcommand, that is required for transformers library to use Autoclasses to load DeciLM-based models (Llama-Nemotron-Super being one of them).
This PR also changes config.py and model.py for the DeciLMForCausalLM classes to have trust_remote_code=True by default (it was False by default previously) for thing to work smoothly without extra parametrizations when run from top-level trtllm-bench.

Performance Summary – `llama_v3.3_nemotron_super_49b`

isl	osl	quant	con	backend	req/s	tps /gpu	avg latency ms	p50 latency ms
5000	500	none	1	cpp	0.1075	13.4317	9 306.1785	9 305.0552
5000	500	fp8	1	cpp	0.1485	18.5636	6 733.4385	6 730.6310
5000	500	none	250	cpp	0.6116	76.4499	317 885.8769	401 171.5739
5000	500	fp8	250	cpp	0.7220	90.2495	269 376.7776	340 154.1910
500	2000	none	1	cpp	0.0304	15.2075	32 878.3526	32 877.1050
500	2000	fp8	1	cpp	0.0435	21.7563	22 981.6188	22 975.2227
500	2000	none	250	cpp	0.3274	163.7098	589 062.8547	733 682.4485
500	2000	fp8	250	cpp	0.4158	207.8830	463 903.2804	577 812.6816

Run Invariants

Model: llama_v3.3_nemotron_super_49b
Backend: cpp (builds TensorRT engines)
Precision: BF16 baseline, FP8 quantized variants
Max batch size: 16 • GPUs: 4 (per-GPU throughput shown above)
Benchmark tool: trtllm-bench
Synthetic dataset: 512 sequences per run

Execution Status Matrix

backend	isl	osl	quant	con	status
cpp	5000	500	none	1	TIMEOUT
cpp	5000	500	fp8	1	TIMEOUT
cpp	5000	500	none	250	PASS
cpp	5000	500	fp8	250	PASS
cpp	500	2000	none	1	TIMEOUT
cpp	500	2000	fp8	1	TIMEOUT
cpp	500	2000	none	250	PASS
cpp	500	2000	fp8	250	PASS

May 07 '25 16:05 venkywonka

/bot run --disable-fail-fast

May 07 '25 16:05 venkywonka

PR_Github #4410 [ run ] triggered by Bot

May 07 '25 16:05 tensorrt-cicd

/bot run --disable-fail-fast

May 07 '25 19:05 venkywonka

PR_Github #4421 [ run ] triggered by Bot

May 07 '25 19:05 tensorrt-cicd

PR_Github #4410 [ run ] completed with state ABORTED

May 07 '25 19:05 tensorrt-cicd

PR_Github #4421 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3183 completed with status: 'FAILURE'

May 07 '25 22:05 tensorrt-cicd

/bot run --disable-fail-fast

May 08 '25 04:05 venkywonka

PR_Github #4472 [ run ] triggered by Bot

May 08 '25 04:05 tensorrt-cicd

PR_Github #4472 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3209 completed with status: 'FAILURE'

May 08 '25 10:05 tensorrt-cicd

/bot run --disable-fail-fast

May 08 '25 14:05 venkywonka

PR_Github #4584 [ run ] triggered by Bot

May 08 '25 14:05 tensorrt-cicd

PR_Github #4584 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3298 completed with status: 'SUCCESS'

May 08 '25 21:05 tensorrt-cicd

/bot run --disable-fail-fast

May 09 '25 22:05 venkywonka

PR_Github #4737 [ run ] triggered by Bot

May 09 '25 22:05 tensorrt-cicd

PR_Github #4737 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3419 completed with status: 'FAILURE'

May 10 '25 02:05 tensorrt-cicd

/bot run --disable-fail-fast

May 12 '25 20:05 venkywonka

PR_Github #4903 [ run ] triggered by Bot

May 12 '25 20:05 tensorrt-cicd

PR_Github #4903 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3555 completed with status: 'SUCCESS'

May 13 '25 01:05 tensorrt-cicd

/bot run --disable-fail-fast

May 13 '25 12:05 venkywonka

PR_Github #5014 [ run ] triggered by Bot

May 13 '25 12:05 tensorrt-cicd

PR_Github #5014 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3647 completed with status: 'FAILURE'

May 13 '25 20:05 tensorrt-cicd

/bot run --disable-fail-fast

May 14 '25 04:05 venkywonka

PR_Github #5109 [ run ] triggered by Bot

May 14 '25 04:05 tensorrt-cicd

PR_Github #5109 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3721 completed with status: 'FAILURE'

May 14 '25 09:05 tensorrt-cicd

/bot run --disable-fail-fast

May 14 '25 22:05 venkywonka

PR_Github #5216 [ run ] triggered by Bot

May 14 '25 22:05 tensorrt-cicd

PR_Github #5216 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3808 completed with status: 'SUCCESS'

May 15 '25 02:05 tensorrt-cicd

/bot run --disable-fail-fast

May 16 '25 19:05 venkywonka

PR_Github #5534 [ run ] triggered by Bot

May 16 '25 19:05 tensorrt-cicd

PR_Github #5534 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #4035 completed with status: 'FAILURE'

May 16 '25 22:05 tensorrt-cicd

test(perf): Add some `Llama-3_3-Nemotron-Super-49B-v1` integration-perf-tests (TRT flow, trtllm-bench)

Description

Performance Summary – llama_v3.3_nemotron_super_49b

Run Invariants

Execution Status Matrix

Performance Summary – `llama_v3.3_nemotron_super_49b`