ZOOKEEPER-4293: Lock Contention in ClientCnxnSocketNetty (3.5.x)
Check for failed/cancelled ChannelFutures before acquiring connectLock. This prevents lock contention where cleanup is unable to complete.
Signed-off-by: Michael Edgar [email protected]
@maoling, would you mind doing a review of this change?
It resolves a lock contention situation where a slow connection allows the client's SendThread to enter cleanUp in ClientCnxnSocketNetty just as the epollEventLoopGroup thread is calling the ChannelFutureListener. The send thread is able to cancel the connectFuture, however it is unable to proceed beyond closing the channel with the epollEventLoopGroup thread waiting on the lock.
The fix is to allow the epollEventLoopGroup thread to detect the cancellation without acquiring the lock.
@MikeEdgar I will take a look. Thanks for your contribution.
Thanks @maoling, @anmolnar. I would be grateful for a sanity check on the approach taken in this PR.
Hi @maoling, @anmolnar - I'm curious if you've had an opportunity to review these changes.
@MikeEdgar Sorry for our late, I'll take a look at this weekend. Thanks for your reminder
do we have a way to test this fix with an unit test ? (even with Mockito)
I am open to suggestions on an approach... Since this is dealing with thread timing, my solution to test it during development was with the debugger and breakpoints to simulate latency, which I understand is not ideal. Are there any other test cases that might deal with similar concepts?
@eolivelli , @maoling - I've finally added some tests with mocked dependencies. Please take a look. Also, this is for the 3.5 branch - should I instead replace the PR with one for master? What is the process for back-ports to 3.6, 3.7, etc.?
@maoling - please let me know if this PR is OK as-is or if it should target master (or another release version) instead.