TensorRT-LLM [TRTC-1943][feat] Env vars override support in LLM API

Description

Context for reviewers:

Currently to get optimal deployments, we expect user to specify an optimal extra_llm_api_options yaml file as well as the right env vars (like in this example in InferenceMax scripts)
One of the low-hanging fruits in improving user UX is to consolidate this env var information within extra_llm_api_options.yml itself.
Enabling a --config alias to --extra_llm_api_options aligns the UX with vLLM and makes intuitive sense.
This PR will enable a foundation for a hermetic recipe database of optimal configs that the user can directly deploy. Towards this larger effort, 2 follow-up PRs that depend on this are in the pipeline:
- #9160 by @FrankD412
- #9272 by @anish-shanbhag

Checklist

[x] Add --config alias to --extra_llm_api_options. Both can now be used interchangeably across trtllm-serve, trtllm-eval, trtllm-bench.
[x] Add env_overrides to LlmArgs and appropriately update api_stability/references/llm.yml. The override will happen at BaseLLM.__init__ for the parent process, and a sanity update inside worker_main to make sure the spawned workers don't inherit the outdated env snapshot. The above will automatically give ability to specify this through config/extra_llm_api_options yaml.
[x] Add testing coverage for --config aliasing
[x] Add testing coverage for env_override in LLM API with a real example propagating it into worker process
[x] TRTLLM_ENABLE_PDL was getting cached at import-time. Change that to pull from env on-demand to appropriately respect the override.

Extended yaml example:

# config.yaml
# Existing LLM API configuration (unchanged)
cuda_graph_config:
  enable_padding: true
  max_batch_size: 256
enable_attention_dp: true
kv_cache_config:
  dtype: fp8
  enable_block_reuse: false
  free_gpu_memory_fraction: 0.85
print_iter_log: true

# NEW: Environment variable overrides
env_overrides:
  TRTLLM_ENABLE_PDL: 1
  NCCL_GRAPH_REGISTER: 1

  export TRTLLM_ENABLE_PDL=0
  trtllm-serve MODEL --config config.yaml # this will override TRTLLM_ENABLE_PDL=1

Known limitations / Future Work

There are some env vars that are cached into code at import-time of tensorrt_llm. So overriding them before LLM launch will have no effect. They should be made to be accessed on-demand whenever they are actually used (which will be after our overrides, so they can take effect).
The logger's env var (TLLM_LOG_LEVEL) is a special case where it binds to a singleton class at import-time, and one can override it by doing logger.set_level() once it has been set at import time.
The current env overrides section does not track invalid env vars. So it can override arbitrary env vars and will not report or warn or raise if a given env var is unused or invalid. Circumventing this requires keeping a global list of all possible used env vars which is beyond scope of this initial PR.
A global env manager would solve most of the above problems, by keeping track of all env vars, overriding them and making sure they are set lazily etc.

Test Coverage

examples/serve/test_serve.py::test_config_file_loading
examples/serve/test_serve.py::test_env_overrides_pdl

Summary by CodeRabbit

New Features
- Added --config CLI flag alias for easier configuration file specification across serve, eval, and benchmark commands.
- Introduced environment variable override capability through configuration files.
- Enhanced PDL (Page-Directed Locking) logging with runtime enablement status visibility.
Tests
- Added environment variable override validation tests and expanded config loading test coverage.

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
[x] Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Nov 12 '25 20:11 venkywonka

📝 Walkthrough

Walkthrough

This PR adds a --config CLI flag alias for --extra_llm_api_options across benchmark and serve commands, introduces environment variable override support via a new env_overrides field, replaces static PDL detection with dynamic environment-based toggling, and updates related integration tests accordingly.

Changes

Cohort / File(s)	Summary
CLI flag aliasing `tensorrt_llm/bench/benchmark/low_latency.py`, `tensorrt_llm/bench/benchmark/throughput.py`, `tensorrt_llm/commands/serve.py`, `tensorrt_llm/commands/eval.py`	Added `--config` as an alias for `--extra_llm_api_options` across four CLI commands, mapping both flags to the same destination and updating help text to reflect dual usage.
Low-latency benchmark environment handling `tensorrt_llm/bench/benchmark/low_latency.py`	Removed in-place `os.environ` assignments; introduced `default_env_overrides` dict merged with user-provided overrides and stored back into `kwargs["env_overrides"]` prior to benchmark setup. Added minor formatting.
Environment override field infrastructure `tensorrt_llm/llmapi/llm_args.py`	Added `env_overrides: Optional[Dict[str, str]]` field to `BaseLlmArgs` model with description "Environment variable overrides."
Environment override processing `tensorrt_llm/llmapi/llm.py`, `tensorrt_llm/executor/worker.py`	Added processing of `env_overrides` from constructor kwargs in `BaseLLM.__init__` to apply environment variable overrides with logging; added pre-usage env synchronization in `worker_main` to update `os.environ` when overrides are present.
PDL dynamic detection `tensorrt_llm/_torch/flashinfer_utils.py`, `tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py`, `tensorrt_llm/_torch/pyexecutor/sampling_utils_flashinfer.py`	Removed static `ENABLE_PDL` constant; updated `get_env_enable_pdl()` to add one-time logging of "PDL enabled" state; replaced all usage of `ENABLE_PDL` with calls to `get_env_enable_pdl()`.
Integration test refactoring `tests/integration/defs/examples/serve/test_serve.py`	Renamed `test_extra_llm_api_options` to `test_config_file_loading` with parametrization over both flag types; added new `test_env_overrides_pdl` test validating environment override application via config file; added necessary imports (`queue`, `subprocess`, `threading`, `pytest`, `yaml`).
Test list updates `tests/integration/test_lists/qa/llm_function_core.txt`, `tests/integration/test_lists/qa/llm_function_nim.txt`	Updated test references replacing `test_extra_llm_api_options` with `test_config_file_loading`; added `test_env_overrides_pdl` to core list; moved negative test cases to separate file.
API stability reference `tests/unittest/api_stability/references/llm.yaml`	Added `env_overrides: Optional[Dict[str, str]]` parameter to `__init__` method signature reference.

Sequence Diagram(s)

sequenceDiagram
    participant CLI
    participant LLM as BaseLLM.__init__
    participant Worker as worker_main
    participant Env as os.environ
    
    CLI->>LLM: Pass env_overrides in kwargs
    LLM->>LLM: Process env_overrides dict
    LLM->>Env: Update os.environ with overrides
    LLM->>LLM: Log old→new values
    
    Note over Worker: Spawned MPI Process
    Worker->>Worker: Check llm_args.env_overrides
    Worker->>Env: Apply overrides to process env
    Worker->>Worker: Continue initialization

sequenceDiagram
    participant Config as Config File
    participant Serve as trtllm-serve
    participant PDL as get_env_enable_pdl()
    participant Flashinfer as flashinfer ops
    
    Config->>Serve: Load env_overrides (TRTLLM_ENABLE_PDL)
    Serve->>Env: Apply env overrides
    Flashinfer->>PDL: Query PDL state at runtime
    PDL->>PDL: Read env var dynamically
    PDL->>PDL: Log "PDL enabled" once per state change
    PDL->>Flashinfer: Return boolean

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Environment override plumbing: Verify that env_overrides correctly flows from CLI config files through BaseLlmArgs to both BaseLLM.__init__ and worker_main, and that values are properly cast to strings before environment application.
PDL dynamic detection side effect: Ensure the one-time logging in get_env_enable_pdl() via the _printed flag is thread-safe and that logging doesn't occur at inappropriate times; verify all call sites now use get_env_enable_pdl() instead of the removed ENABLE_PDL constant.
Test coverage: Confirm test_env_overrides_pdl correctly validates both enabled and disabled PDL states, properly captures server logs, and handles threading/process management without race conditions.
CLI alias consistency: Verify that all four commands (low_latency, throughput, serve, eval) use consistent help text and properly map the new --config flag to extra_llm_api_options.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main feature: environment variable override support in the LLM API, with proper JIRA ticket format and [feat] type.
Description check	✅ Passed	PR description is comprehensive, well-structured, and addresses all required template sections with clear explanations of context, changes, test coverage, and checklists.

✨ Finishing touches

[ ] 📝 Generate docstrings

🧪 Generate unit tests (beta)

[ ] Create PR with unit tests
[ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Nov 13 '25 00:11 coderabbitai[bot]

/bot run

Nov 13 '25 07:11 venkywonka

PR_Github #24419 [ run ] triggered by Bot. Commit: 4061c65

Nov 13 '25 07:11 tensorrt-cicd

PR_Github #24419 [ run ] completed with state SUCCESS. Commit: 4061c65 /LLM/main/L0_MergeRequest_PR pipeline #18426 completed with status: 'FAILURE'

Nov 13 '25 08:11 tensorrt-cicd

@coderabbitai help

Nov 13 '25 17:11 venkywonka

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

CodeRabbit commands

These commands are invoked using PR/Issue comments.

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai evaluate custom pre-merge check --instructions <custom-checks-instructions> --name <custom-checks-title> [--mode <error|warning>] to validate the custom pre-merge checks instructions. Defaults to error when --mode is omitted.
- @coderabbitai ignore pre-merge checks to override pre-merge checks and get an approval on PR.
- @coderabbitai run pre-merge checks to run pre-merge checks on the pull request.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve to resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, support, documentation and community

Visit our status page to check the current availability of CodeRabbit.
Create a ticket on our support page for assistance with any issues or questions.
Visit our documentation site for detailed information on how to use CodeRabbit.
Join our Discord community to connect with other users and get help from the community.
Follow us on X/Twitter for updates and announcements.

Nov 13 '25 17:11 coderabbitai[bot]

@coderabbitai full review

Nov 13 '25 17:11 venkywonka

✅ Actions performed

Full review triggered.

Nov 13 '25 17:11 coderabbitai[bot]

/bot run

Nov 14 '25 03:11 venkywonka

@coderabbitai full review

Nov 14 '25 03:11 venkywonka

✅ Actions performed

Full review triggered.

Nov 14 '25 03:11 coderabbitai[bot]

PR_Github #24552 [ run ] triggered by Bot. Commit: 33d922e

Nov 14 '25 03:11 tensorrt-cicd

PR_Github #24552 [ run ] completed with state SUCCESS. Commit: 33d922e /LLM/main/L0_MergeRequest_PR pipeline #18533 completed with status: 'FAILURE'

Nov 14 '25 05:11 tensorrt-cicd

/bot run

Nov 14 '25 07:11 venkywonka

PR_Github #24570 [ run ] triggered by Bot. Commit: 33d922e

Nov 14 '25 07:11 tensorrt-cicd

PR_Github #24570 [ run ] completed with state SUCCESS. Commit: 33d922e /LLM/main/L0_MergeRequest_PR pipeline #18546 completed with status: 'FAILURE'

Nov 14 '25 08:11 tensorrt-cicd

/bot run

Nov 18 '25 01:11 venkywonka

PR_Github #24817 [ run ] triggered by Bot. Commit: 80ba775

Nov 18 '25 01:11 tensorrt-cicd

PR_Github #24817 [ run ] completed with state SUCCESS. Commit: 80ba775 /LLM/main/L0_MergeRequest_PR pipeline #18729 completed with status: 'FAILURE'

Nov 18 '25 03:11 tensorrt-cicd

@coderabbitai full review

Nov 19 '25 07:11 venkywonka

✅ Actions performed

Full review triggered.

Nov 19 '25 07:11 coderabbitai[bot]

/bot run

Nov 19 '25 07:11 venkywonka

PR_Github #25002 [ run ] triggered by Bot. Commit: f56f0cb

Nov 19 '25 07:11 tensorrt-cicd

/bot run

Nov 19 '25 22:11 venkywonka

PR_Github #25100 [ run ] triggered by Bot. Commit: f0d7bdf

Nov 19 '25 22:11 tensorrt-cicd

/bot run

Nov 19 '25 22:11 venkywonka

PR_Github #25103 [ run ] triggered by Bot. Commit: bf7dd27

Nov 19 '25 22:11 tensorrt-cicd

PR_Github #25100 [ run ] completed with state ABORTED. Commit: f0d7bdf LLM/main/L0_MergeRequest_PR #18975 (Blue Ocean) completed with status: ABORTED

Nov 19 '25 22:11 tensorrt-cicd

PR_Github #25103 [ run ] completed with state FAILURE. Commit: bf7dd27 /LLM/main/L0_MergeRequest_PR pipeline #18978 completed with status: 'FAILURE'

Nov 19 '25 23:11 tensorrt-cicd

/bot run

Nov 20 '25 00:11 venkywonka