websocket icon indicating copy to clipboard operation
websocket copied to clipboard

[FEATURE] Reduce support issues related to concurrent writes

Open hulkingshtick opened this issue 1 year ago • 3 comments

Is there an existing feature request for this?

  • [X] I have searched the existing feature requests

Is your feature request related to a problem? Please describe.

The documentation says:

Applications are responsible for ensuring that no more than one goroutine calls the write methods (NextWriter, SetWriteDeadline, WriteMessage, WriteJSON, EnableWriteCompression, SetCompressionLevel) concurrently

To help developers detect concurrency errors in their application, the connection makes a best effort to detect concurrent calls to WriteMessage, WriteJSON and Write on the writer returned from NextWriter. When a concurrent calls is detected, the connection panics with the string concurrent write to websocket connection. This panic is better than the alternatives such as commingling messages on the underlying network connection or a nil pointer exception.

The concurrent write to websocket connection panic is the most common support issue for the package. The issue arises because developers incorrectly assume that concurrency is allowed or wrongly think they covered all the bases with regards to ensuring a single writer.

Describe the solution that you would like.

Here are options for improving the situation:

Improve the panic string

The terse panic string does not sufficiently describe the problem or a fix. Create a markdown file on this repository explaining the concurrency restriction and suggested fixes. Include a link to the markdown file in the panic string.

Improve the detector

The current detector is:

if c.isWriting {
	panic("concurrent write to websocket connection")
}
c.isWriting = true

err := c.write(w.frameType, c.writeDeadline, c.writeBuf[framePos:w.pos], extra)

if !c.isWriting {
	panic("concurrent write to websocket connection")
}
c.isWriting = false

The second concurrent writer panics, but not the first. The stack trace for the first concurrent writer may be the key to finding the bug in the application. Cause the first writer to panic using the following code. In this code, the field c.writing is an int and replaces the field c.isWriting.

if c.Writing != 0 {
            c.Writing = 2
	panic("concurrent write to websocket connection")
}
c.Writing = 1

err := c.write(w.frameType, c.writeDeadline, c.writeBuf[framePos:w.pos], extra)

if c.Writing != 1 {
	panic("concurrent write to websocket connection")
}
c.Writing = 0

(this code requires more review and testing).

Return errors instead of panicking

Return an error instead of panicking. Include the stack trace of the caller in the error string.

Describe alternatives you have considered.

I considered and rejected making the existing methods thread safe. It's possible to make the individual methods thread safe, but the SetWriteDeadline, EnableWriteCompression and SetCompressionLevel methods require synchronization at the application level to be useful. Because robust applications should call SetWriteDeadline, a robust application will implement synchronization at the application level. It follows that there's no point in making the individual methods thread safe.

I also considered changing the API to accept a WriteOptions struct for NextWriter, WriteMessage and WriteJSON. The options struct replaces the SetWriteDeadline, EnableWriteCompression and SetCompressionLevel methods. I rejected this idea because it's a breaking change to the API.

Anything else?

No response

hulkingshtick avatar Sep 02 '24 01:09 hulkingshtick

Thanks @hulkingshtick for this thorough write-up. Based on what you've written and the previous threads on this, I'd say we should probably perform a combo of Return errors instead of panicking as well as a dedicated MD Doc as you suggested in Improve the panic string to tackle this.

I'd prioritise any PRs done for this to be released asap.

jaitaiwan avatar Sep 02 '24 02:09 jaitaiwan

The more I think about it, panic is better for developers than returning an error. Because many developers ignore errors, applications will silently fail when an error is returned. The panic makes the developers aware that there is a programming error in their application.


If the panicking goroutine does not recover from the panic, then the application exits with a dump of all goroutine stack traces. It should be easy for developers find the two concurrent writers from those stack traces.

I suspect that the panicking goroutine is often a goroutine started by the net/http server. These goroutines recover from panics and print the stack trace of the panicking goroutine only. It's difficult to find the other concurrent writer with one stack trace.

This simple change can cause the other goroutine to panic:

if c.isWriting {
	c.isWriting = false // <--- add this line
	panic("concurrent write to websocket connection")
}
c.isWriting = true

err := c.write(w.frameType, c.writeDeadline, c.writeBuf[framePos:w.pos], extra)

if !c.isWriting {
	panic("concurrent write to websocket connection")
}
c.isWriting = false

Here's a demonstration of this change: https://go.dev/play/p/ItZzymA0OEf. Notice how the application prints stack traces for example1 and example2. The application prints the stack trace for example2 only when the new c.isWriting = false line is deleted.


Because the concurrency detector is only executed from functions where concurrency is not allowed, there are two possible causes of the panic: the application called the write methods concurrently or there's a bug in the concurrency detector.

If there's a bug in the concurrency detector, then there's a code path where a single threaded program will panic. I examined all of the connection code and cannot concoct a code path where a single threaded program will panic. The field isWriting starts as false and set to false on return from c.flushFrame(). From there, we can conclude that c.isWriting is false on entry to c.flushFrame().

ghost avatar Sep 02 '24 23:09 ghost

this problem still have panic: concurrent write to websocket connection

goroutine 274 [running]: github.com/gorilla/websocket.(*messageWriter).flushFrame(0xc006411950, 0x1, {0x0?, 0x0?, 0x0?}) /home/john_f/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:617 +0x4af github.com/gorilla/websocket.(*messageWriter).Close(0x503188?) /home/john_f/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:731 +0x35 github.com/gorilla/websocket.(*Conn).beginMessage(0xc0037af080, 0xc005e37500, 0x1) /home/john_f/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:480 +0x37 github.com/gorilla/websocket.(*Conn).NextWriter(0xc0037af080, 0x1) /home/john_f/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:520 +0x3f github.com/gorilla/websocket.(*Conn).WriteMessage(0x94cf60?, 0xc003507038?, {0xc0038548a0, 0x1e, 0x20}) /home/john_f/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:773 +0x136 ai-calls/internal/server/handler.handleElevenLabsMessages({0xa97770, 0xc00464b7c0}, 0xc0037af080, 0xc003805340, 0xc00466a140) /home/john_f/projects/go/calls/internal/server/handler/outbound_call_stream.go:222 +0x31a created by ai-calls/internal/server/handler.(*Handler).OutboundCallStream in goroutine 206 /home/john_f/projects/go/calls/internal/server/handler/outbound_call_stream.go:118 +0x419

lib4u avatar Apr 10 '25 12:04 lib4u

imaked this wrapper for websocket with sync.Mutex, nowthis work perfect for me and my program not get panic you can use that

package websocket

import (
	"net/http"
	"sync"

	"github.com/gorilla/websocket"
)

// The message types are defined in RFC 6455, section 11.8.
const (
	// TextMessage denotes a text data message. The text message payload is
	// interpreted as UTF-8 encoded text data.
	TextMessage = 1

	// BinaryMessage denotes a binary data message.
	BinaryMessage = 2

	// CloseMessage denotes a close control message. The optional message
	// payload contains a numeric code and text. Use the FormatCloseMessage
	// function to format a close message payload.
	CloseMessage = 8

	// PingMessage denotes a ping control message. The optional message payload
	// is UTF-8 encoded text.
	PingMessage = 9

	// PongMessage denotes a pong control message. The optional message payload
	// is UTF-8 encoded text.
	PongMessage = 10
)

type SafeUpgrader struct {
	upgrader websocket.Upgrader
}

type SafeConn struct {
	Conn  *websocket.Conn
	mutex sync.Mutex
}

func NewUpgrader() SafeUpgrader {
	return SafeUpgrader{
		upgrader: websocket.Upgrader{
			CheckOrigin: func(r *http.Request) bool {
				return true
			},
		},
	}
}

func (s SafeUpgrader) Upgrade(w http.ResponseWriter, r *http.Request, responseHeader http.Header) (*SafeConn, error) {
	conn, err := s.upgrader.Upgrade(w, r, responseHeader)
	if err != nil {
		return nil, err
	}
	return &SafeConn{Conn: conn}, nil
}

func NewWebSocketConnect(urlStr string, requestHeader http.Header) (*SafeConn, *http.Response, error) {
	conn, response, err := websocket.DefaultDialer.Dial(urlStr, requestHeader)
	if err != nil {
		return nil, nil, err
	}
	socketConn := &SafeConn{Conn: conn}
	return socketConn, response, err
}

func (s *SafeConn) WriteMessage(msgType int, data []byte) error {
	s.mutex.Lock()
	defer s.mutex.Unlock()
	return s.Conn.WriteMessage(msgType, data)
}

func (s *SafeConn) ReadMessage() (int, []byte, error) {
	return s.Conn.ReadMessage()
}

func (s *SafeConn) Close() error {
	return s.Conn.Close()
}

lib4u avatar Apr 10 '25 18:04 lib4u

Bad idea.

ghost avatar May 12 '25 12:05 ghost