node-redis icon indicating copy to clipboard operation
node-redis copied to clipboard

Pub/sub subscription does not resubscribe successfully after a socket closed unexpectedly error

Open rodmccutcheon opened this issue 1 year ago • 1 comments

Description

We found when upgrading the node-redis client from version 3.1.2 to 4.7.0 it failed to recover after a socket closed unexpectedly error. We added all of the possible event listeners and we can see it attempting to reconnect, but never reach ready state: Screenshot 2024-08-26 at 1 42 09 PM

This happens when the system is under significant load. This is affecting our ability to upgrade the node-redis client library in our production environment.

Production environment:

Node.js Version: "18.18.1" Redis Server Version: "6.2" Node Redis Version: "4.7.0" Platform: AWS Elasticache

Example code to reproduce: https://github.com/Milanote/node-redis-bug

I couldn't create a nice clean assertion, but the basic gist is this test case sends a large payload (5mb) frequently until the client-output-buffer-limit is reached. The v3 version seems to reconnect and resubscribe correctly and I can see more messages received. The v4 test case doesn't seem to be able to resubscribe correctly, and receives fewer messages overall. I've tried debugging the client code a bit, but found it hard to pinpoint the exact issue.

It seems to hang resolving the resubscribe promise: https://github.com/redis/node-redis/blob/942de1f0b4868f0f6464b2e0702b621a3373c4ee/packages/client/lib/client/commands-queue.ts#L324

  • Does it hang because the timeout is reset to 0 after successful connection? (https://github.com/redis/node-redis/blob/942de1f0b4868f0f6464b2e0702b621a3373c4ee/packages/client/lib/client/socket.ts#L164)
  • I tried setting disableOfflineQueue to false but that didn't seem to help. Could it be because the write buffer is not cleared?

Node.js Version

18.18.1

Redis Server Version

6.2

Node Redis Version

4.7.0

Platform

Linux

Logs

No response

rodmccutcheon avatar Aug 26 '24 03:08 rodmccutcheon

I tried playing with your repo, and it seems like both versions are running about the same (the "happy case" works, and the 5MB case causes the connection to die -> reconnect -> resubscribe). Anyway, writing to the socket faster than the network/server can handle is something you should not do, and there is not much the client can do about it.. If you want we can debug this together, just ping me in the redis discord (my handle is @leibale)

leibale avatar Sep 19 '24 12:09 leibale

This issue has been automatically marked as stale due to inactivity. It will be closed in 30 days if no further activity occurs. If you believe this issue is still relevant, please add a comment to keep it open.

github-actions[bot] avatar Sep 20 '25 00:09 github-actions[bot]

This issue has been automatically closed due to inactivity. If you believe this issue is still relevant, please reopen it or create a new issue with updated information.

github-actions[bot] avatar Oct 20 '25 00:10 github-actions[bot]