Gracefully handle Slack websocket closing
I am...
- [ ] Reporting a bug
- [x] Suggesting a new feature
- [ ] Requesting help with running my bot
- [ ] Requesting help writing plugins
- [ ] Here about something else
I am running...
Errbot version: 6.1.8
OS version: Debian (python:3.9-slim container)
Python version: 3.9
Using a virtual environment: no (using Docker instead)
Backend: Slack
Issue description
The Slack RTM websocket closing (Slack goodbye event) causes a Python error. This Python error should be handled more gracefully (ie using some try/except), and instead log a warning when the websocket is closed and reopened.
Steps to reproduce
Run a Slack bot using the master branch, and leave it running for a few hours. After 3-12hr you should get a Python error due to the Slack RTM websocket closed.
Workaround
Switch to the slackv3 backend.
Additional info
2021-06-25 22:19:23,760 ERROR errbot.backends.slack Error reading from RTM stream:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/slackclient/server.py", line 283, in websocket_safe_read
data += "{0}\n".format(self.websocket.recv())
File "/usr/local/lib/python3.9/site-packages/websocket/_core.py", line 310, in recv
opcode, data = self.recv_data()
File "/usr/local/lib/python3.9/site-packages/websocket/_core.py", line 327, in recv_data
opcode, frame = self.recv_data_frame(control_frame)
File "/usr/local/lib/python3.9/site-packages/websocket/_core.py", line 340, in recv_data_frame
frame = self.recv_frame()
File "/usr/local/lib/python3.9/site-packages/websocket/_core.py", line 374, in recv_frame
return self.frame_buffer.recv_frame()
File "/usr/local/lib/python3.9/site-packages/websocket/_abnf.py", line 361, in recv_frame
self.recv_header()
File "/usr/local/lib/python3.9/site-packages/websocket/_abnf.py", line 309, in recv_header
header = self.recv_strict(2)
File "/usr/local/lib/python3.9/site-packages/websocket/_abnf.py", line 396, in recv_strict
bytes_ = self.recv(min(16384, shortage))
File "/usr/local/lib/python3.9/site-packages/websocket/_core.py", line 449, in _recv
return recv(self.sock, bufsize)
File "/usr/local/lib/python3.9/site-packages/websocket/_socket.py", line 93, in recv
raise WebSocketConnectionClosedException(
websocket._exceptions.WebSocketConnectionClosedException: Connection is already closed.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/errbot/backends/slack.py", line 442, in serve_once
for message in self.sc.rtm_read():
File "/usr/local/lib/python3.9/site-packages/slackclient/client.py", line 235, in rtm_read
json_data = self.server.websocket_safe_read()
File "/usr/local/lib/python3.9/site-packages/slackclient/server.py", line 300, in websocket_safe_read
raise SlackConnectionError(
slackclient.server.SlackConnectionError: Unable to send due to closed RTM websocket
2021-06-25 22:19:23,766 INFO errbot.core Disconnect callback, deactivating all the plugins.
2021-06-25 22:19:23,775 INFO errbot.backends.base Reconnecting in 1 seconds (0 attempted reconnections so far).
2021-06-25 22:19:24,777 INFO errbot.backends.slack Verifying authentication token
2021-06-25 22:19:25,107 INFO errbot.backends.slack Connecting to Slack real-time-messaging API
2021-06-25 22:19:25,802 INFO errbot.backends.slack Connected
2021-06-25 22:19:25,955 INFO errbot.core Activate internal commands
2021-06-25 22:19:25,955 INFO errbot.plugin_manager Activate bot plugins...
2021-06-25 22:19:25,955 INFO errbot.plugin_manager Activate plugin: Help.
2021-06-25 22:19:25,957 INFO errbot.core_plugins.wsview Checking Help for webhooks
2021-06-25 22:19:25,959 INFO errbot.plugin_manager Activate plugin: CommandNotFoundFilter.
2021-06-25 22:19:25,962 INFO errbot.core_plugins.wsview Checking CommandNotFoundFilter for webhooks
2021-06-25 22:19:25,963 INFO errbot.plugin_manager Activate plugin: Utils.
2021-06-25 22:19:25,964 INFO errbot.core_plugins.wsview Checking Utils for webhooks
2021-06-25 22:19:25,964 INFO errbot.plugin_manager Activate plugin: Plugins.
2021-06-25 22:19:25,967 INFO errbot.core_plugins.wsview Checking Plugins for webhooks
2021-06-25 22:19:25,967 INFO errbot.plugin_manager Activate plugin: ACLs.
2021-06-25 22:19:25,969 INFO errbot.core_plugins.wsview Checking ACLS for webhooks
2021-06-25 22:19:25,969 INFO errbot.plugin_manager Activate plugin: Health.
2021-06-25 22:19:25,971 INFO errbot.core_plugins.wsview Checking Health for webhooks
2021-06-25 22:19:25,972 INFO errbot.plugin_manager Activate plugin: Tradingpost.
2021-06-25 22:19:25,973 INFO errbot.core_plugins.wsview Checking Tradingpost for webhooks
2021-06-25 22:19:25,977 INFO errbot.plugin_manager Activate plugin: Random.
2021-06-25 22:19:25,978 INFO errbot.core_plugins.wsview Checking Random for webhooks
2021-06-25 22:19:25,978 INFO errbot.core Notifying connection to all the plugins...
2021-06-25 22:19:25,979 INFO errbot.core Plugin activation done.
Are you using plain bot scope or the new granular scopes?
Are you using plain bot scope or the new granular scopes?
I'm using the old, plain bot scope. How would that effect the gracefulness of this error? :confused:
I need to understand your setup fully so that I can attempt to replicate the issue you're having.
I was able to reproduce the issue. I'll dig into the code to see what can be done to handle this case cleanly.
I've got a tentative solution here https://github.com/errbotio/errbot/pull/1463. Would you be able to test it on your setup @torgeirl ?
I've got a tentative solution here #1463. Would you be able to test it on your setup @torgeirl ?
@nzlosh: I tested it out, but it didn't change anything. :disappointed:
It does however look like the changes are made for the SlackRTM backend, and not the regular Slack backend? Both backends reference RTM streams, but this particular error occurs in errbot/backends/slack.py. Would this solution also work for the regular Slack backend?
(It was my understanding that the SlackRTM backend is beta so once we got the regular Slack backend working again last month I switch back to it.)
You're quite right, I mistook the RTM stream errors to be related to the slack_rtm backend. The issue is the same between both backends, but the code is not the same, so the changes I've made won't apply to slack. I'll take a look at patching the slack backend when I get some free time.
OK. Thanks for taking the time to look into this!
@torgeirl I also got this issue, because reaching the maximum duration of a RTM web socket connection (8 hours), slack close the RTM connection. https://api.slack.com/events/goodbye
Do you have any update for this one? Thanks
Would you mind trying this backend https://github.com/nzlosh/err-backend-slackv3 and letting me know if you encounter the problem.
I using errbotv5.2 and slackclientv1.3.1, because our slack is huge, so rtm.start is naturally more difficult to use with Enterprise Grid and other large workspaces.
We often got the rtm_read error, after troubleshooting, we find out the bottleneck is CPU, old instance type is t3a.medium, which is often crash at errbot restart and need to wait for 6 mins to pull information from slack.
After the change to c5.large, I test 3 times restart just now, no crash anymore, faster to pull information from slack, only 4 mins
I have been receiving the same error message in 2 different bots. (One running locally and one running in a k8s cluster) Utilizing the 'Slack' backend. Unlimited mem/cpu errbot 6.1.7 slackclient 1.3.2
should i be using Slack_rtm instead?
@drahamim The websocket closing issue was resolved on the slackv3 if you'd like to try it out.
Updated the opening post with my current environment and included the slackv3 backend as a workaround to help users finding this issue when looking for answers.
I do hope this issue either can be fixed for for the slack backend or that the slackv3 backend becomes stable enough for inclusion in the main repository.
Can confirm the connection re-establishing itself is graceful handled as of Errbot v6.2.0 using the SlackV3 backend. Every few hours it logs the following info messages (INFO; not ERROR as before):
2024-01-13 21:08:57,031 INFO slack_sdk.socket_mode.builtin.client The connection seems to be stale. Disconnecting... (session id: <old-session-ID>, reason: disconnected for 25+ seconds)
2024-01-13 21:08:59,230 INFO slack_sdk.socket_mode.builtin.client The connection has been closed (session id: <old-session-ID>)
2024-01-13 21:08:59,230 INFO slack_sdk.socket_mode.builtin.client The session seems to be already closed. Reconnecting... (session id: <old-session-ID>)
2024-01-13 21:08:59,231 INFO slack_sdk.socket_mode.builtin.client Connecting to a new endpoint...
2024-01-13 21:08:59,902 INFO slack_sdk.socket_mode.builtin.client The connection has been closed (session id: <old-session-ID>)
2024-01-13 21:08:59,902 INFO slack_sdk.socket_mode.builtin.client A new session has been established (session id: <new-session-ID>)
2024-01-13 21:08:59,902 INFO slack_sdk.socket_mode.builtin.client Connected to a new endpoint...
2024-01-13 21:09:00,052 INFO slack_sdk.socket_mode.builtin.client Stopped receiving messages from a connection (session id: <old-session-ID>)
2024-01-13 21:09:00,153 INFO slack_sdk.socket_mode.builtin.client Starting to receive messages from a new connection (session id: <new-session-ID>)
2024-01-13 21:09:00,154 INFO errbot.core Activate internal commands
2024-01-13 21:09:00,155 INFO errbot.plugin_manager Activate bot plugins...
The graceful handling is immediately followed by some errors however, but that seems to be caused by the CORE_PLUGINS issues explained in #1591:
INFO errbot.core Activate internal commands
INFO errbot.plugin_manager Activate bot plugins...
ERROR errbot.plugin_manager Error loading VersionChecker.
(...)
AttributeError: 'NoneType' object has no attribute 'is_activated'
ERROR errbot.plugin_manager Error loading TextCmds.
(etc, etc, etc)