websocket icon indicating copy to clipboard operation
websocket copied to clipboard

Deadlocks due to CloseRead

Open creker opened this issue 2 years ago • 4 comments

nhooyr.io/websocket v1.8.10

The library uses defer defer wg.Wait() in multiple functions and that causes calls to deadlock instead of properly shutting down when connection is terminated.

One possible situation is this:

  1. The server calls CloseRead
  2. The Client sends some message
  3. Goroutine inCloseRead unblocks from Reader(ctx) and calls Close
  4. Close blocks on defer c.wg.Wait() but it can't succeed due to CloseRead calling wg.Add(1) and expecting it to be decremented with defer wg.Done() on exit from CloseRead. But CloseRead cannot exit due to Close blocking. We have a deadlock and a goroutine leak. Context is never cancelled either which means all the other code that interacts with websocket also can't properly exit.

Switching to manually calling Read in a separate goroutine and cancelling context from it solves this problem. Basically, you have to reimplement CloseRead but without blocking on waitGroup

creker avatar Dec 19 '23 09:12 creker

Copy. Will rework the close code ASAP. See also #427

nhooyr avatar Dec 19 '23 09:12 nhooyr

Actually can you provide a trace?

CloseRead done's the wait group before it closes. https://github.com/nhooyr/websocket/blob/e3a2d32f704fb06c439e56d2a85334de04b50d32/read.go#L69

But I will concede the use of a wait group in this way is certainly potentially panicky as the wg.Add(1) call could occur concurrently with a close and thus panic.

nhooyr avatar Dec 19 '23 16:12 nhooyr

@nhooyr the problem is that c.wg.Done() is deferred and called on exit from CloseRead. But CloseRead cannot exit because it calls Close which blocks on c.wg.Wait(). Same with context cancellation.

creker avatar Dec 19 '23 16:12 creker

Oh I'm blind great point! Confirmed. 🤦🏼

nhooyr avatar Dec 19 '23 16:12 nhooyr

Thanks for reporting. I've fixed this in #427. Please review and let me know if you see anything else.

nhooyr avatar Apr 05 '24 23:04 nhooyr

Sorry for the delay. Fixed by #427.

nhooyr avatar Apr 07 '24 15:04 nhooyr