perf: Optimisations for PP + attention DP
- Remove MPI world broadcast in fetch_adp_new_requests
- Sync request finish point for last and intermediate pp ranks to avoid deadlock in trtllm-bench runs.
/bot run
PR_Github #667 [ run ] triggered by Bot
PR_Github #667 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #561 completed with status: 'SUCCESS'
/bot run
PR_Github #800 [ run ] triggered by Bot
PR_Github #800 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #646 completed with status: 'FAILURE'
/bot run
PR_Github #802 [ run ] triggered by Bot
PR_Github #802 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #648 completed with status: 'SUCCESS'
/bot reuse-pipeline
PR_Github #813 [ reuse-pipeline ] triggered by Bot
PR_Github #813 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #802 for commit b901251