TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

fix: disable KV cache reuse if using attention sink

Open Funatiq opened this issue 1 year ago • 25 comments

Funatiq avatar Mar 24 '25 09:03 Funatiq

/bot run

Funatiq avatar Mar 24 '25 09:03 Funatiq

PR_Github #284 [ run ] triggered by Bot

niukuo avatar Mar 24 '25 09:03 niukuo

I wonder if maybe this should be more invasive - i.e., erase logic that includes the sink bubble length in reuse, like maxTokenNum? Or are you waiting for me to do so in my VSWA PR?

I would suggest to prohibit the configuration first. Then everyone is free to refactor under this assumption.

Funatiq avatar Mar 24 '25 10:03 Funatiq

PR_Github #284 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #273 completed with status: 'FAILURE'

niukuo avatar Mar 24 '25 12:03 niukuo

/bot run

Funatiq avatar Mar 24 '25 13:03 Funatiq

PR_Github #299 [ run ] triggered by Bot

niukuo avatar Mar 24 '25 13:03 niukuo

PR_Github #299 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #286 completed with status: 'FAILURE'

niukuo avatar Mar 24 '25 14:03 niukuo

/bot run

Funatiq avatar Mar 24 '25 15:03 Funatiq

PR_Github #318 [ run ] triggered by Bot

niukuo avatar Mar 24 '25 16:03 niukuo

PR_Github #318 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #303 completed with status: 'FAILURE'

niukuo avatar Mar 24 '25 18:03 niukuo

/bot run

Funatiq avatar Mar 25 '25 07:03 Funatiq

PR_Github #395 [ run ] triggered by Bot

niukuo avatar Mar 25 '25 07:03 niukuo

PR_Github #395 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #351 completed with status: 'FAILURE'

niukuo avatar Mar 25 '25 09:03 niukuo

/bot run --disable-fail-fast

Funatiq avatar Mar 25 '25 09:03 Funatiq

PR_Github #423 [ run ] triggered by Bot

niukuo avatar Mar 25 '25 09:03 niukuo

PR_Github #423 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #364 completed with status: 'FAILURE'

niukuo avatar Mar 25 '25 17:03 niukuo

/bot run --disable-fail-fast

Funatiq avatar Mar 25 '25 18:03 Funatiq

PR_Github #463 [ run ] triggered by Bot

niukuo avatar Mar 25 '25 18:03 niukuo

PR_Github #463 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #396 completed with status: 'SUCCESS'

niukuo avatar Mar 25 '25 21:03 niukuo

/bot reuse-pipeline

Funatiq avatar Mar 26 '25 07:03 Funatiq

PR_Github #538 [ reuse-pipeline ] triggered by Bot

niukuo avatar Mar 26 '25 07:03 niukuo

PR_Github #538 [ reuse-pipeline ] completed with state SUCCESS Reusing PR_Github #463 for commit e57a93b

niukuo avatar Mar 26 '25 07:03 niukuo

/bot reuse-pipeline

Funatiq avatar Mar 26 '25 07:03 Funatiq

PR_Github #539 [ reuse-pipeline ] triggered by Bot

niukuo avatar Mar 26 '25 07:03 niukuo

PR_Github #539 [ reuse-pipeline ] completed with state SUCCESS Reusing PR_Github #463 for commit 812d7dc

niukuo avatar Mar 26 '25 07:03 niukuo

/bot reuse-pipeline

Funatiq avatar Apr 15 '25 13:04 Funatiq

PR_Github #2337 [ reuse-pipeline ] triggered by Bot

tensorrt-cicd avatar Apr 15 '25 13:04 tensorrt-cicd

PR_Github #2337 [ reuse-pipeline ] completed with state SUCCESS Can't reuse PR_Github #0 with status: UNKNOWN

tensorrt-cicd avatar Apr 15 '25 13:04 tensorrt-cicd

/bot reuse-pipeline

Funatiq avatar Apr 15 '25 18:04 Funatiq

PR_Github #2358 [ reuse-pipeline ] triggered by Bot

tensorrt-cicd avatar Apr 15 '25 18:04 tensorrt-cicd