TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

refactor: Simplify disableLookahead and improve numDecodingEngineTokens handling

Open Funatiq opened this issue 10 months ago • 6 comments

  • Move numDecodingEngineTokens from DecoderState->mJointDecodingInput to DecoderState itself.
    • It's not needed in the inputs, but in the outer decoding loop.
  • Simplify disableLookahead
    • Don't take the batch size as a parameter, but use the current mMaxBatchSize of the DecoderState.

Funatiq avatar Mar 26 '25 11:03 Funatiq

/bot run

Funatiq avatar Mar 26 '25 11:03 Funatiq

PR_Github #577 [ run ] triggered by Bot

niukuo avatar Mar 26 '25 11:03 niukuo

PR_Github #577 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #492 completed with status: 'SUCCESS'

niukuo avatar Mar 26 '25 14:03 niukuo

/bot run

Funatiq avatar Mar 28 '25 07:03 Funatiq

PR_Github #675 [ run ] triggered by Bot

tensorrt-cicd avatar Mar 28 '25 07:03 tensorrt-cicd

PR_Github #675 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #566 completed with status: 'SUCCESS'

tensorrt-cicd avatar Mar 28 '25 13:03 tensorrt-cicd

/bot reuse-pipeline

Funatiq avatar Apr 01 '25 10:04 Funatiq

PR_Github #894 [ reuse-pipeline ] triggered by Bot

tensorrt-cicd avatar Apr 01 '25 10:04 tensorrt-cicd

PR_Github #894 [ reuse-pipeline ] completed with state SUCCESS Reusing PR_Github #675 for commit cfb248d

tensorrt-cicd avatar Apr 01 '25 10:04 tensorrt-cicd