[WIP][SYCL][HostTask] Optimize blocked users tracking
This commit partially addresses a performance issue observed when submitting consecutive host tasks to an in-order queue without explicit wait(). The execution time of each host task was found to increase significantly as the number of submissions grew:
https://github.com/intel/llvm/issues/18500.
The major cause was identified as the unnecessary tracking of indirect blocking dependencies in MBlockedUsers. Previously, all direct and indirect blocking relations between enqueued commands were tracked, causing a siginificant increase in notification time upon task completion. For example, in a sequence of tasks A, B, C, D, A.MBlockedUsers would redundantly include {C, D}, even though these tasks are already blocked by B.
To resolve this, the enqueueCommand function was enhanced to include a TrackBlockedUser flag during recursion enqueueing. This change prevents excessive growth in the size of Cmd->MBlockedUsers in long dependency chains by only tracking the host task immediate dominator in the dependency tree, thereby reducing notification time upon command completion.
@Nuullll hi, it seems incorrect. The reason why we track non direct blocked users is that host task enqueues blocked users on its completion and if we have: HT1 K2 depending on HT1 K3 depending on K2 (and implicitly depending on HT1)
then first enqueue of K2 and K3 is failed if HT1 is not completed. In this case if we enqueue only direct dependency K2 on host task completion - there is nobody to enqueue K3.
This PR seems to be causing build hangs, please fix before rerunning :)