[TRTC-1943][feat] Env vars override support in LLM API
Description
Context for reviewers:
- Currently to get optimal deployments, we expect user to specify an optimal
extra_llm_api_optionsyaml file as well as the right env vars (like in this example in InferenceMax scripts) - One of the low-hanging fruits in improving user UX is to consolidate this env var information within
extra_llm_api_options.ymlitself. - Enabling a
--configalias to--extra_llm_api_optionsaligns the UX with vLLM and makes intuitive sense. - This PR will enable a foundation for a hermetic recipe database of optimal configs that the user can directly deploy. Towards this larger effort, 2 follow-up PRs that depend on this are in the pipeline:
- #9160 by @FrankD412
- #9272 by @anish-shanbhag
Checklist
- [x] Add
--configalias to--extra_llm_api_options. Both can now be used interchangeably acrosstrtllm-serve, trtllm-eval, trtllm-bench. - [x] Add
env_overridestoLlmArgsand appropriately updateapi_stability/references/llm.yml. The override will happen atBaseLLM.__init__for the parent process, and a sanity update insideworker_mainto make sure the spawned workers don't inherit the outdated env snapshot. The above will automatically give ability to specify this through config/extra_llm_api_options yaml. - [x] Add testing coverage for
--configaliasing - [x] Add testing coverage for
env_overridein LLM API with a real example propagating it into worker process - [x]
TRTLLM_ENABLE_PDLwas getting cached at import-time. Change that to pull from env on-demand to appropriately respect the override.
Extended yaml example:
# config.yaml
# Existing LLM API configuration (unchanged)
cuda_graph_config:
enable_padding: true
max_batch_size: 256
enable_attention_dp: true
kv_cache_config:
dtype: fp8
enable_block_reuse: false
free_gpu_memory_fraction: 0.85
print_iter_log: true
# NEW: Environment variable overrides
env_overrides:
TRTLLM_ENABLE_PDL: 1
NCCL_GRAPH_REGISTER: 1
export TRTLLM_ENABLE_PDL=0
trtllm-serve MODEL --config config.yaml # this will override TRTLLM_ENABLE_PDL=1
Known limitations / Future Work
- There are some env vars that are cached into code at import-time of tensorrt_llm. So overriding them before LLM launch will have no effect. They should be made to be accessed on-demand whenever they are actually used (which will be after our overrides, so they can take effect).
- The logger's env var (
TLLM_LOG_LEVEL) is a special case where it binds to a singleton class at import-time, and one can override it by doinglogger.set_level()once it has been set at import time. - The current env overrides section does not track invalid env vars. So it can override arbitrary env vars and will not report or warn or raise if a given env var is unused or invalid. Circumventing this requires keeping a global list of all possible used env vars which is beyond scope of this initial PR.
- A global env manager would solve most of the above problems, by keeping track of all env vars, overriding them and making sure they are set lazily etc.
Test Coverage
examples/serve/test_serve.py::test_config_file_loading
examples/serve/test_serve.py::test_env_overrides_pdl
Summary by CodeRabbit
-
New Features
- Added
--configCLI flag alias for easier configuration file specification across serve, eval, and benchmark commands. - Introduced environment variable override capability through configuration files.
- Enhanced PDL (Page-Directed Locking) logging with runtime enablement status visibility.
- Added
-
Tests
- Added environment variable override validation tests and expanded config loading test coverage.
PR Checklist
Please review the following before submitting your PR:
-
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
-
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
-
Test cases are provided for new code paths (see test instructions)
-
Any new dependencies have been scanned for license and vulnerabilities
-
CODEOWNERS updated if ownership changes
-
Documentation updated as needed
-
Update tava architecture diagram if there is a significant design change in PR.
-
The reviewers assigned automatically/manually are appropriate for the PR.
-
[x] Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...
Provide a user friendly way for developers to interact with a Jenkins server.
Run /bot [-h|--help] to print this help message.
See details below for each supported subcommand.
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]
Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.
--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.
--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.
--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.
--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.
--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.
--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.
--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.
--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.
--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.
--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.
--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".
--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.
--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.
For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.
kill
kill
Kill all running builds associated with pull request.
skip
skip --comment COMMENT
Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.
reuse-pipeline
reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.
π Walkthrough
Walkthrough
This PR adds a --config CLI flag alias for --extra_llm_api_options across benchmark and serve commands, introduces environment variable override support via a new env_overrides field, replaces static PDL detection with dynamic environment-based toggling, and updates related integration tests accordingly.
Changes
| Cohort / File(s) | Summary |
|---|---|
CLI flag aliasing tensorrt_llm/bench/benchmark/low_latency.py, tensorrt_llm/bench/benchmark/throughput.py, tensorrt_llm/commands/serve.py, tensorrt_llm/commands/eval.py |
Added --config as an alias for --extra_llm_api_options across four CLI commands, mapping both flags to the same destination and updating help text to reflect dual usage. |
Low-latency benchmark environment handling tensorrt_llm/bench/benchmark/low_latency.py |
Removed in-place os.environ assignments; introduced default_env_overrides dict merged with user-provided overrides and stored back into kwargs["env_overrides"] prior to benchmark setup. Added minor formatting. |
Environment override field infrastructure tensorrt_llm/llmapi/llm_args.py |
Added env_overrides: Optional[Dict[str, str]] field to BaseLlmArgs model with description "Environment variable overrides." |
Environment override processing tensorrt_llm/llmapi/llm.py, tensorrt_llm/executor/worker.py |
Added processing of env_overrides from constructor kwargs in BaseLLM.__init__ to apply environment variable overrides with logging; added pre-usage env synchronization in worker_main to update os.environ when overrides are present. |
PDL dynamic detection tensorrt_llm/_torch/flashinfer_utils.py, tensorrt_llm/_torch/custom_ops/flashinfer_custom_ops.py, tensorrt_llm/_torch/pyexecutor/sampling_utils_flashinfer.py |
Removed static ENABLE_PDL constant; updated get_env_enable_pdl() to add one-time logging of "PDL enabled" state; replaced all usage of ENABLE_PDL with calls to get_env_enable_pdl(). |
Integration test refactoring tests/integration/defs/examples/serve/test_serve.py |
Renamed test_extra_llm_api_options to test_config_file_loading with parametrization over both flag types; added new test_env_overrides_pdl test validating environment override application via config file; added necessary imports (queue, subprocess, threading, pytest, yaml). |
Test list updates tests/integration/test_lists/qa/llm_function_core.txt, tests/integration/test_lists/qa/llm_function_nim.txt |
Updated test references replacing test_extra_llm_api_options with test_config_file_loading; added test_env_overrides_pdl to core list; moved negative test cases to separate file. |
API stability reference tests/unittest/api_stability/references/llm.yaml |
Added env_overrides: Optional[Dict[str, str]] parameter to __init__ method signature reference. |
Sequence Diagram(s)
sequenceDiagram
participant CLI
participant LLM as BaseLLM.__init__
participant Worker as worker_main
participant Env as os.environ
CLI->>LLM: Pass env_overrides in kwargs
LLM->>LLM: Process env_overrides dict
LLM->>Env: Update os.environ with overrides
LLM->>LLM: Log oldβnew values
Note over Worker: Spawned MPI Process
Worker->>Worker: Check llm_args.env_overrides
Worker->>Env: Apply overrides to process env
Worker->>Worker: Continue initialization
sequenceDiagram
participant Config as Config File
participant Serve as trtllm-serve
participant PDL as get_env_enable_pdl()
participant Flashinfer as flashinfer ops
Config->>Serve: Load env_overrides (TRTLLM_ENABLE_PDL)
Serve->>Env: Apply env overrides
Flashinfer->>PDL: Query PDL state at runtime
PDL->>PDL: Read env var dynamically
PDL->>PDL: Log "PDL enabled" once per state change
PDL->>Flashinfer: Return boolean
Estimated code review effort
π― 3 (Moderate) | β±οΈ ~25 minutes
-
Environment override plumbing: Verify that
env_overridescorrectly flows from CLI config files throughBaseLlmArgsto bothBaseLLM.__init__andworker_main, and that values are properly cast to strings before environment application. -
PDL dynamic detection side effect: Ensure the one-time logging in
get_env_enable_pdl()via the_printedflag is thread-safe and that logging doesn't occur at inappropriate times; verify all call sites now useget_env_enable_pdl()instead of the removedENABLE_PDLconstant. -
Test coverage: Confirm
test_env_overrides_pdlcorrectly validates both enabled and disabled PDL states, properly captures server logs, and handles threading/process management without race conditions. -
CLI alias consistency: Verify that all four commands (
low_latency,throughput,serve,eval) use consistent help text and properly map the new--configflag toextra_llm_api_options.
Pre-merge checks and finishing touches
β Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | β οΈ Warning | Docstring coverage is 28.00% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
β Passed checks (2 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | β Passed | The title clearly and concisely describes the main feature: environment variable override support in the LLM API, with proper JIRA ticket format and [feat] type. |
| Description check | β Passed | PR description is comprehensive, well-structured, and addresses all required template sections with clear explanations of context, changes, test coverage, and checklists. |
β¨ Finishing touches
- [ ] π Generate docstrings
π§ͺ Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
/bot run
PR_Github #24419 [ run ] triggered by Bot. Commit: 4061c65
PR_Github #24419 [ run ] completed with state SUCCESS. Commit: 4061c65
/LLM/main/L0_MergeRequest_PR pipeline #18426 completed with status: 'FAILURE'
@coderabbitai help
Chat
There are 3 ways to chat with CodeRabbit:
- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
-
I pushed a fix in commit <commit_id>, please review it. -
Open a follow-up GitHub issue for this discussion.
-
- Files and specific lines of code (under the "Files changed" tab): Tag
@coderabbitaiin a new review comment at the desired location with your query. - PR comments: Tag
@coderabbitaiin a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:-
@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase. -
@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
-
CodeRabbit commands
These commands are invoked using PR/Issue comments.
-
@coderabbitai pauseto pause the reviews on a PR. -
@coderabbitai resumeto resume the paused reviews. -
@coderabbitai reviewto trigger an incremental review. This is useful when automatic reviews are disabled for the repository. -
@coderabbitai full reviewto do a full review from scratch and review all the files again. -
@coderabbitai summaryto regenerate the summary of the PR. -
@coderabbitai evaluate custom pre-merge check --instructions <custom-checks-instructions> --name <custom-checks-title> [--mode <error|warning>]to validate the custom pre-merge checks instructions. Defaults toerrorwhen--modeis omitted.-
@coderabbitai ignore pre-merge checksto override pre-merge checks and get an approval on PR. -
@coderabbitai run pre-merge checksto run pre-merge checks on the pull request.
-
-
@coderabbitai generate docstringsto generate docstrings for this PR. -
@coderabbitai generate sequence diagramto generate a sequence diagram of the changes in this PR. -
@coderabbitai generate unit teststo generate unit tests for this PR. -
@coderabbitai resolveto resolve all the CodeRabbit review comments. -
@coderabbitai configurationto show the current CodeRabbit configuration for the repository. -
@coderabbitai helpto get help.
Other keywords and placeholders
- Add
@coderabbitai ignoreor@coderabbit ignoreanywhere in the PR description to prevent this PR from being reviewed. - Add
@coderabbitai summaryto generate the high-level summary at a specific location in the PR description. - Add
@coderabbitaior@coderabbitai titleanywhere in the PR title to generate the title automatically.
Status, support, documentation and community
- Visit our status page to check the current availability of CodeRabbit.
- Create a ticket on our support page for assistance with any issues or questions.
- Visit our documentation site for detailed information on how to use CodeRabbit.
- Join our Discord community to connect with other users and get help from the community.
- Follow us on X/Twitter for updates and announcements.
@coderabbitai full review
β Actions performed
Full review triggered.
/bot run
@coderabbitai full review
β Actions performed
Full review triggered.
PR_Github #24552 [ run ] triggered by Bot. Commit: 33d922e
PR_Github #24552 [ run ] completed with state SUCCESS. Commit: 33d922e
/LLM/main/L0_MergeRequest_PR pipeline #18533 completed with status: 'FAILURE'
/bot run
PR_Github #24570 [ run ] triggered by Bot. Commit: 33d922e
PR_Github #24570 [ run ] completed with state SUCCESS. Commit: 33d922e
/LLM/main/L0_MergeRequest_PR pipeline #18546 completed with status: 'FAILURE'
/bot run
PR_Github #24817 [ run ] triggered by Bot. Commit: 80ba775
PR_Github #24817 [ run ] completed with state SUCCESS. Commit: 80ba775
/LLM/main/L0_MergeRequest_PR pipeline #18729 completed with status: 'FAILURE'
@coderabbitai full review
β Actions performed
Full review triggered.
/bot run
PR_Github #25002 [ run ] triggered by Bot. Commit: f56f0cb
/bot run
PR_Github #25100 [ run ] triggered by Bot. Commit: f0d7bdf
/bot run
PR_Github #25103 [ run ] triggered by Bot. Commit: bf7dd27
PR_Github #25100 [ run ] completed with state ABORTED. Commit: f0d7bdf
LLM/main/L0_MergeRequest_PR #18975 (Blue Ocean) completed with status: ABORTED
PR_Github #25103 [ run ] completed with state FAILURE. Commit: bf7dd27
/LLM/main/L0_MergeRequest_PR pipeline #18978 completed with status: 'FAILURE'
/bot run