TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

chore: [TRTLLM-3694] Move functional args to llmargs

Open hchings opened this issue 10 months ago • 24 comments

Moving args shared across TRT & Torch backend out from BuildConfig to first-level LLMArgs as discussed w/ Chunwei.

hchings avatar Mar 24 '25 18:03 hchings

/bot run

hchings avatar Mar 24 '25 18:03 hchings

PR_Github #325 [ run ] triggered by Bot

niukuo avatar Mar 24 '25 19:03 niukuo

PR_Github #325 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #306 completed with status: 'FAILURE'

niukuo avatar Mar 24 '25 20:03 niukuo

/bot run

hchings avatar Mar 24 '25 21:03 hchings

PR_Github #332 [ run ] triggered by Bot

niukuo avatar Mar 24 '25 21:03 niukuo

PR_Github #332 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #312 completed with status: 'SUCCESS'

niukuo avatar Mar 25 '25 00:03 niukuo

/bot run

hchings avatar Mar 26 '25 06:03 hchings

PR_Github #528 [ run ] triggered by Bot

niukuo avatar Mar 26 '25 06:03 niukuo

PR_Github #528 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #451 completed with status: 'FAILURE'

niukuo avatar Mar 26 '25 08:03 niukuo

/bot run

hchings avatar Mar 26 '25 22:03 hchings

PR_Github #619 [ run ] triggered by Bot

tensorrt-cicd avatar Mar 26 '25 22:03 tensorrt-cicd

PR_Github #619 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #521 completed with status: 'FAILURE'

tensorrt-cicd avatar Mar 27 '25 00:03 tensorrt-cicd

/bot run

hchings avatar Mar 27 '25 03:03 hchings

PR_Github #633 [ run ] triggered by Bot

tensorrt-cicd avatar Mar 27 '25 03:03 tensorrt-cicd

PR_Github #633 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #532 completed with status: 'FAILURE'

tensorrt-cicd avatar Mar 27 '25 04:03 tensorrt-cicd

/bot run

hchings avatar Mar 27 '25 05:03 hchings

PR_Github #642 [ run ] triggered by Bot

tensorrt-cicd avatar Mar 27 '25 05:03 tensorrt-cicd

PR_Github #642 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #542 completed with status: 'FAILURE'

tensorrt-cicd avatar Mar 27 '25 07:03 tensorrt-cicd

/bot run

hchings avatar Mar 28 '25 00:03 hchings

PR_Github #662 [ run ] triggered by Bot

tensorrt-cicd avatar Mar 28 '25 00:03 tensorrt-cicd

PR_Github #662 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #557 completed with status: 'FAILURE'

tensorrt-cicd avatar Mar 28 '25 01:03 tensorrt-cicd

/bot run

hchings avatar Mar 28 '25 04:03 hchings

PR_Github #671 [ run ] triggered by Bot

tensorrt-cicd avatar Mar 28 '25 04:03 tensorrt-cicd

PR_Github #671 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #564 completed with status: 'SUCCESS'

tensorrt-cicd avatar Mar 28 '25 11:03 tensorrt-cicd

/bot reuse-pipeline

hchings avatar Mar 28 '25 18:03 hchings

PR_Github #682 [ reuse-pipeline ] triggered by Bot

tensorrt-cicd avatar Mar 28 '25 18:03 tensorrt-cicd

PR_Github #682 [ reuse-pipeline ] completed with state SUCCESS Reusing PR_Github #671 for commit e9b9826

tensorrt-cicd avatar Mar 28 '25 18:03 tensorrt-cicd

@hchings should LLM() instantiations in tensorrt_llm/bench changed to match this new interface? for example, https://github.com/NVIDIA/TensorRT-LLM/blob/435cd2983db9327b9c4aac4f299e69d3c2954d34/tensorrt_llm/bench/benchmark/throughput.py#L230

amukkara avatar Apr 01 '25 01:04 amukkara

@hchings should LLM() instantiations in tensorrt_llm/bench changed to match this new interface? for example,

https://github.com/NVIDIA/TensorRT-LLM/blob/435cd2983db9327b9c4aac4f299e69d3c2954d34/tensorrt_llm/bench/benchmark/throughput.py#L230

@amukkara The trtllm-bench shares the code path between trtllm TRT and PyT backends, which may be tricky to unify. The functional knobs touched in this MR will make the usage variant in both backends, and that is as expected. For instance, for PyT, it is required to use LLM(max_batch_size=...) and build_config is invalid; while for TRT, it is OK to use LLM(build_config=BuildConfig(max_batch_size=...)). In the long run, we will converge to the PyT path. cc @hchings to correct me if wrong. cc @FrankD412 for viz.

Superjomn avatar Apr 01 '25 03:04 Superjomn

@Superjomn distinction between TRT and PyT backend makes sense. In that case, there is one instance where BuildConfig is still used in PyT path and as a result max_seq_len cannot be correctly initialized. @FrankD412 I think the fix would be a one line change here

amukkara avatar Apr 01 '25 05:04 amukkara