feat: Support cos_sin_cache in all cases.
This MR contains the following updates:
- Handle
fuse_pos_embd=True/Falseand createRotaryEmbeddinginside attention module, so that the users don't need to handle it in the modeling files. - Cache
cos_sinfor unfused rope implementation. If flashinfer is available, useapply_rope_with_cos_sin_cache_inplaceinstead ofapply_rope_inplace. Otherwise, we fallback to pure pytorch implementation, which can support any rope now. - We use
create_rope_const_paramsto create and cachecos_sin_cachefor all rope types, including Deepseek yarn rope.
/bot run --add-multi-gpu-test
PR_Github #283 [ run ] triggered by Bot
PR_Github #283 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #272 completed with status: 'FAILURE'
/bot run --add-multi-gpu-test
PR_Github #387 [ run ] triggered by Bot
PR_Github #387 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #345 completed with status: 'FAILURE'
/bot run --add-multi-gpu-test
PR_Github #430 [ run ] triggered by Bot
PR_Github #430 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #369 completed with status: 'FAILURE'
/bot run --add-multi-gpu-test
PR_Github #510 [ run ] triggered by Bot
I think I am pinged by mistake, is the review request actually pointed to @litaotju ?
PR_Github #510 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #437 completed with status: 'FAILURE'
@yuxianq Can we split this PR to several small PRs? For example, the first item can be a single PR.
- Handle fuse_pos_embd=True/False and create RotaryEmbedding inside attention module, so that the users don't need to handle it in the modeling files.
Can we split this PR to several small PRs? For example, the first item can be a single PR.
@QiJune I will have a try. Let me pass the CI first to validate that these features work correctly.
/bot run --add-multi-gpu-test
PR_Github #550 [ run ] triggered by Bot
PR_Github #550 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #469 completed with status: 'FAILURE'
/bot run --disable-fail-fast --add-multi-gpu-test
PR_Github #584 [ run ] triggered by Bot
PR_Github #584 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #497 completed with status: 'FAILURE'
/bot run --disable-fail-fast --stage-list "A30-7"
/bot run --disable-fail-fast --stage-list "A30-7"
PR_Github #1005 [ run ] triggered by Bot
PR_Github #1005 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #776 (Partly Tested) completed with status: 'FAILURE'
/bot run --disable-fail-fast --add-multi-gpu-test
PR_Github #1030 [ run ] triggered by Bot
PR_Github #1030 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #794 completed with status: 'FAILURE'
/bot run --disable-fail-fast --add-multi-gpu-test
PR_Github #1088 [ run ] triggered by Bot