TensorRT-LLM refactor: Simplify disableLookahead and improve numDecodingEngineTokens handling

Move numDecodingEngineTokens from DecoderState->mJointDecodingInput to DecoderState itself.
- It's not needed in the inputs, but in the outer decoding loop.
Simplify disableLookahead
- Don't take the batch size as a parameter, but use the current mMaxBatchSize of the DecoderState.

Mar 26 '25 11:03 Funatiq

/bot run

Mar 26 '25 11:03 Funatiq

PR_Github #577 [ run ] triggered by Bot

Mar 26 '25 11:03 niukuo

PR_Github #577 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #492 completed with status: 'SUCCESS'

Mar 26 '25 14:03 niukuo

/bot run

Mar 28 '25 07:03 Funatiq

PR_Github #675 [ run ] triggered by Bot

Mar 28 '25 07:03 tensorrt-cicd

PR_Github #675 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #566 completed with status: 'SUCCESS'

Mar 28 '25 13:03 tensorrt-cicd

/bot reuse-pipeline

Apr 01 '25 10:04 Funatiq

PR_Github #894 [ reuse-pipeline ] triggered by Bot

Apr 01 '25 10:04 tensorrt-cicd

PR_Github #894 [ reuse-pipeline ] completed with state SUCCESS Reusing PR_Github #675 for commit cfb248d

Apr 01 '25 10:04 tensorrt-cicd