zookeeper icon indicating copy to clipboard operation
zookeeper copied to clipboard

ZOOKEEPER-4293: Lock Contention in ClientCnxnSocketNetty (3.5.x)

Open MikeEdgar opened this issue 4 years ago • 8 comments

Check for failed/cancelled ChannelFutures before acquiring connectLock. This prevents lock contention where cleanup is unable to complete.

Signed-off-by: Michael Edgar [email protected]

MikeEdgar avatar Jun 14 '21 13:06 MikeEdgar

@maoling, would you mind doing a review of this change?

It resolves a lock contention situation where a slow connection allows the client's SendThread to enter cleanUp in ClientCnxnSocketNetty just as the epollEventLoopGroup thread is calling the ChannelFutureListener. The send thread is able to cancel the connectFuture, however it is unable to proceed beyond closing the channel with the epollEventLoopGroup thread waiting on the lock.

The fix is to allow the epollEventLoopGroup thread to detect the cancellation without acquiring the lock.

MikeEdgar avatar Jun 16 '21 13:06 MikeEdgar

@MikeEdgar I will take a look. Thanks for your contribution.

maoling avatar Jun 17 '21 11:06 maoling

Thanks @maoling, @anmolnar. I would be grateful for a sanity check on the approach taken in this PR.

MikeEdgar avatar Jun 21 '21 14:06 MikeEdgar

Hi @maoling, @anmolnar - I'm curious if you've had an opportunity to review these changes.

MikeEdgar avatar Jul 21 '21 00:07 MikeEdgar

@MikeEdgar Sorry for our late, I'll take a look at this weekend. Thanks for your reminder

maoling avatar Jul 23 '21 05:07 maoling

do we have a way to test this fix with an unit test ? (even with Mockito)

I am open to suggestions on an approach... Since this is dealing with thread timing, my solution to test it during development was with the debugger and breakpoints to simulate latency, which I understand is not ideal. Are there any other test cases that might deal with similar concepts?

MikeEdgar avatar Jul 23 '21 13:07 MikeEdgar

@eolivelli , @maoling - I've finally added some tests with mocked dependencies. Please take a look. Also, this is for the 3.5 branch - should I instead replace the PR with one for master? What is the process for back-ports to 3.6, 3.7, etc.?

MikeEdgar avatar Jul 13 '22 10:07 MikeEdgar

@maoling - please let me know if this PR is OK as-is or if it should target master (or another release version) instead.

MikeEdgar avatar Aug 08 '22 12:08 MikeEdgar