Deadlocks due to CloseRead
nhooyr.io/websocket v1.8.10
The library uses defer defer wg.Wait() in multiple functions and that causes calls to deadlock instead of properly shutting down when connection is terminated.
One possible situation is this:
- The server calls
CloseRead - The Client sends some message
- Goroutine in
CloseReadunblocks fromReader(ctx)and callsClose -
Closeblocks ondefer c.wg.Wait()but it can't succeed due toCloseReadcallingwg.Add(1)and expecting it to be decremented withdefer wg.Done()on exit fromCloseRead. ButCloseReadcannot exit due toCloseblocking. We have a deadlock and a goroutine leak. Context is never cancelled either which means all the other code that interacts with websocket also can't properly exit.
Switching to manually calling Read in a separate goroutine and cancelling context from it solves this problem. Basically, you have to reimplement CloseRead but without blocking on waitGroup
Copy. Will rework the close code ASAP. See also #427
Actually can you provide a trace?
CloseRead done's the wait group before it closes. https://github.com/nhooyr/websocket/blob/e3a2d32f704fb06c439e56d2a85334de04b50d32/read.go#L69
But I will concede the use of a wait group in this way is certainly potentially panicky as the wg.Add(1) call could occur concurrently with a close and thus panic.
@nhooyr the problem is that c.wg.Done() is deferred and called on exit from CloseRead. But CloseRead cannot exit because it calls Close which blocks on c.wg.Wait(). Same with context cancellation.
Oh I'm blind great point! Confirmed. 🤦🏼
Thanks for reporting. I've fixed this in #427. Please review and let me know if you see anything else.
Sorry for the delay. Fixed by #427.