TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

feat: NIXL interface integration

Open Shixiaowei02 opened this issue 9 months ago • 42 comments

This PR, in conjunction with PR 3769 , provides an interface solution for dynamically linking NIXL.

image

Shixiaowei02 avatar Apr 29 '25 06:04 Shixiaowei02

/bot run

Shixiaowei02 avatar Apr 29 '25 15:04 Shixiaowei02

PR_Github #3740 [ run ] triggered by Bot

tensorrt-cicd avatar Apr 29 '25 15:04 tensorrt-cicd

PR_Github #3740 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #2648 completed with status: 'FAILURE'

tensorrt-cicd avatar Apr 30 '25 00:04 tensorrt-cicd

/bot run

Shixiaowei02 avatar Apr 30 '25 01:04 Shixiaowei02

PR_Github #3785 [ run ] triggered by Bot

tensorrt-cicd avatar Apr 30 '25 01:04 tensorrt-cicd

PR_Github #3785 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #2678 completed with status: 'SUCCESS'

tensorrt-cicd avatar Apr 30 '25 09:04 tensorrt-cicd

/bot run

Shixiaowei02 avatar May 12 '25 13:05 Shixiaowei02

PR_Github #4881 [ run ] triggered by Bot

tensorrt-cicd avatar May 12 '25 13:05 tensorrt-cicd

PR_Github #4881 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #3535 completed with status: 'FAILURE'

tensorrt-cicd avatar May 12 '25 15:05 tensorrt-cicd

@Shixiaowei02 , you will need to move the installation of NIXL back into the devel stage in docker. Otherwise, this MR will break the build of the TRT-LLM release image since you introduce a compile time dependency if I understand this correctly.

@MartinMarciniszyn Okay, I have modified the installation in the image, which will increase the overall image size by approximately 150 MB. I will make further modifications to install precompiled libraries directly instead of building from source to shrink the image size. Currently, NIXL can be disabled during the build, but it will be required for testing in the pre-merge/post-merge pipeline.

Shixiaowei02 avatar May 13 '25 08:05 Shixiaowei02

/bot run

Shixiaowei02 avatar May 13 '25 08:05 Shixiaowei02

PR_Github #4983 [ run ] triggered by Bot

tensorrt-cicd avatar May 13 '25 08:05 tensorrt-cicd

PR_Github #4983 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #3621 completed with status: 'FAILURE'

tensorrt-cicd avatar May 13 '25 09:05 tensorrt-cicd

/bot run

Shixiaowei02 avatar May 13 '25 10:05 Shixiaowei02

PR_Github #4994 [ run ] triggered by Bot

tensorrt-cicd avatar May 13 '25 10:05 tensorrt-cicd

PR_Github #4994 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #3630 completed with status: 'FAILURE'

tensorrt-cicd avatar May 13 '25 10:05 tensorrt-cicd

/bot run

Shixiaowei02 avatar May 14 '25 02:05 Shixiaowei02

PR_Github #5095 [ run ] triggered by Bot

tensorrt-cicd avatar May 14 '25 02:05 tensorrt-cicd

PR_Github #5095 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #3710 completed with status: 'FAILURE'

tensorrt-cicd avatar May 14 '25 04:05 tensorrt-cicd

/bot run

Shixiaowei02 avatar May 14 '25 06:05 Shixiaowei02

PR_Github #5118 [ run ] triggered by Bot

tensorrt-cicd avatar May 14 '25 06:05 tensorrt-cicd

/bot run

Shixiaowei02 avatar May 14 '25 14:05 Shixiaowei02

PR_Github #5185 [ run ] triggered by Bot

tensorrt-cicd avatar May 14 '25 14:05 tensorrt-cicd

PR_Github #5118 [ run ] completed with state ABORTED

tensorrt-cicd avatar May 14 '25 14:05 tensorrt-cicd

/bot run

Shixiaowei02 avatar May 15 '25 01:05 Shixiaowei02

PR_Github #5228 [ run ] triggered by Bot

tensorrt-cicd avatar May 15 '25 01:05 tensorrt-cicd

PR_Github #5185 [ run ] completed with state ABORTED

tensorrt-cicd avatar May 15 '25 01:05 tensorrt-cicd

/bot run --disable-fail-fast

Shixiaowei02 avatar May 15 '25 01:05 Shixiaowei02

PR_Github #5232 [ run ] triggered by Bot

tensorrt-cicd avatar May 15 '25 01:05 tensorrt-cicd

PR_Github #5228 [ run ] completed with state ABORTED

tensorrt-cicd avatar May 15 '25 01:05 tensorrt-cicd