TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

feat: Support cos_sin_cache in all cases.

Open yuxianq opened this issue 10 months ago • 21 comments

This MR contains the following updates:

  1. Handle fuse_pos_embd=True/False and create RotaryEmbedding inside attention module, so that the users don't need to handle it in the modeling files.
  2. Cache cos_sin for unfused rope implementation. If flashinfer is available, use apply_rope_with_cos_sin_cache_inplace instead of apply_rope_inplace. Otherwise, we fallback to pure pytorch implementation, which can support any rope now.
  3. We use create_rope_const_params to create and cache cos_sin_cache for all rope types, including Deepseek yarn rope.

yuxianq avatar Mar 24 '25 09:03 yuxianq

/bot run --add-multi-gpu-test

yuxianq avatar Mar 24 '25 09:03 yuxianq

PR_Github #283 [ run ] triggered by Bot

niukuo avatar Mar 24 '25 09:03 niukuo

PR_Github #283 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #272 completed with status: 'FAILURE'

niukuo avatar Mar 24 '25 11:03 niukuo

/bot run --add-multi-gpu-test

yuxianq avatar Mar 25 '25 06:03 yuxianq

PR_Github #387 [ run ] triggered by Bot

niukuo avatar Mar 25 '25 06:03 niukuo

PR_Github #387 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #345 completed with status: 'FAILURE'

niukuo avatar Mar 25 '25 07:03 niukuo

/bot run --add-multi-gpu-test

yuxianq avatar Mar 25 '25 12:03 yuxianq

PR_Github #430 [ run ] triggered by Bot

niukuo avatar Mar 25 '25 12:03 niukuo

PR_Github #430 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #369 completed with status: 'FAILURE'

niukuo avatar Mar 25 '25 14:03 niukuo

/bot run --add-multi-gpu-test

yuxianq avatar Mar 26 '25 04:03 yuxianq

PR_Github #510 [ run ] triggered by Bot

niukuo avatar Mar 26 '25 04:03 niukuo

I think I am pinged by mistake, is the review request actually pointed to @litaotju ?

BestJuly avatar Mar 26 '25 05:03 BestJuly

PR_Github #510 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #437 completed with status: 'FAILURE'

niukuo avatar Mar 26 '25 05:03 niukuo

@yuxianq Can we split this PR to several small PRs? For example, the first item can be a single PR.

  1. Handle fuse_pos_embd=True/False and create RotaryEmbedding inside attention module, so that the users don't need to handle it in the modeling files.

QiJune avatar Mar 26 '25 06:03 QiJune

Can we split this PR to several small PRs? For example, the first item can be a single PR.

@QiJune I will have a try. Let me pass the CI first to validate that these features work correctly.

yuxianq avatar Mar 26 '25 07:03 yuxianq

/bot run --add-multi-gpu-test

yuxianq avatar Mar 26 '25 08:03 yuxianq

PR_Github #550 [ run ] triggered by Bot

niukuo avatar Mar 26 '25 08:03 niukuo

PR_Github #550 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #469 completed with status: 'FAILURE'

niukuo avatar Mar 26 '25 11:03 niukuo

/bot run --disable-fail-fast --add-multi-gpu-test

yuxianq avatar Mar 26 '25 12:03 yuxianq

PR_Github #584 [ run ] triggered by Bot

niukuo avatar Mar 26 '25 12:03 niukuo

PR_Github #584 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #497 completed with status: 'FAILURE'

niukuo avatar Mar 26 '25 15:03 niukuo

/bot run --disable-fail-fast --stage-list "A30-7"

yuxianq avatar Apr 02 '25 08:04 yuxianq

/bot run --disable-fail-fast --stage-list "A30-7"

yuxianq avatar Apr 02 '25 09:04 yuxianq

PR_Github #1005 [ run ] triggered by Bot

tensorrt-cicd avatar Apr 02 '25 09:04 tensorrt-cicd

PR_Github #1005 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #776 (Partly Tested) completed with status: 'FAILURE'

tensorrt-cicd avatar Apr 02 '25 13:04 tensorrt-cicd

/bot run --disable-fail-fast --add-multi-gpu-test

yuxianq avatar Apr 02 '25 15:04 yuxianq

PR_Github #1030 [ run ] triggered by Bot

tensorrt-cicd avatar Apr 02 '25 15:04 tensorrt-cicd

PR_Github #1030 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #794 completed with status: 'FAILURE'

tensorrt-cicd avatar Apr 02 '25 18:04 tensorrt-cicd

/bot run --disable-fail-fast --add-multi-gpu-test

yuxianq avatar Apr 03 '25 07:04 yuxianq

PR_Github #1088 [ run ] triggered by Bot

tensorrt-cicd avatar Apr 03 '25 07:04 tensorrt-cicd