CPS lock aware JTF doesn't work correctly, when RaiseTransitioningEvents is not fired
We noticed some JoinableTask/Project lock hangs due to the JTF holding a lock (and switching to UI thread) is not added to the lock service collection, even those JTFs are created with the lock aware JTF factory.
It is caused by a special optimization logic in JTF library. https://github.com/Microsoft/vs-threading/blob/022a5269613e7093c2ffb6c7a1074b8d416e268b/src/Microsoft.VisualStudio.Threading/JoinableTask.cs#L572-L575
The condition here to skip the event when synchronouslyBlockingMainThread is true is not correct. The current JTF is UI thread blocking task doesn't ensure that SwitchToUIThread won't be blocked due to several reasons:
-
the UI thread can run into a child JTF.Run on the top of the earlier task. The inner JTF.Run will not allow the task related to the outside JTF.Run to go to the UI thread by default.
-
the UI thread blocking JTF may have completed. It is very common a spin off task (either Task.Run, or Dataflow.Post can easily leak JTF, or it can be captured by many places capturing ExecutionContext, like a task continuation point, or cancellationToken registration) can retain the old JTF, which may have been completed. A completed UI thread blocking JTF doesn't provide any advantage for the later request to switch to the UI thread.
In both cases, the pending JTF will not be tracked by the lock service due to this issue, and leads product to run into a dead lock.
Your explanation makes sense. And I agree that removing the optimization would give CPS the chance to mitigate the deadlock. I can't think of any reason the optimization was required as opposed to just being an optimization. So you could try removing it and if no tests fail, then it's probably safe to add a new test to cover at least one of the scenarios you identified above and check it in.
CC: @jviau
We checked in some workarounds in 15.7 to reduce JTF leaks. Hopefully, that will reduce this issue. I can take a look the fix in JTF in 15.8.