Replays will not display the latest records, need to restart Docker for it to work properly
Self-Hosted Version
23.8.0
CPU Architecture
x86_64
Docker Version
24.0.5
Docker Compose Version
2.20.2
Steps to Reproduce
Problems appeared less than 24 hours after installation. Replays won't display the latest data, requiring a restart, but after restarting, only the data before the restart can be seen, and the newly added records still won't display. I've tried reinstalling, but the problem persists.
Expected Result
Replays running normally
Actual Result
Sentry-self-hosted-clickhouse-1 is not work properly.
docker ps
sentry-self-hosted-clickhouse-1 STATUS=Restarting (139) Less than a second ago
volumes/sentry-self-hosted_sentry-clickhouse-log/_data/clickhouse-server.err.log
2023.09.04 02:32:49.273415 [ 1 ] {} <Error> Application: Listen [::]:8123 failed: Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = DNS error: EAI: -9 (version 20.3.9.70 (official build)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host>
2023.09.04 02:32:49.273576 [ 1 ] {} <Error> Application: Listen [::]:9000 failed: Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = DNS error: EAI: -9 (version 20.3.9.70 (official build)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host>
2023.09.04 02:32:49.273669 [ 1 ] {} <Error> Application: Listen [::]:9009 failed: Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = DNS error: EAI: -9 (version 20.3.9.70 (official build)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host>
2023.09.04 02:32:49.273746 [ 1 ] {} <Error> Application: Listen [::]:9004 failed: Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = DNS error: EAI: -9 (version 20.3.9.70 (official build)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host>
2023.09.04 02:33:38.357586 [ 72 ] {} <Warning> Settings: Unknown setting database_atomic_wait_for_drop_and_detach_synchronously, skipping
2023.09.04 02:33:38.358584 [ 72 ] {} <Warning> Settings: Unknown setting database_atomic_wait_for_drop_and_detach_synchronously, skipping
2023.09.04 02:33:38.359498 [ 72 ] {} <Warning> Settings: Unknown setting database_atomic_wait_for_drop_and_detach_synchronously, skipping
2023.09.04 02:33:38.380039 [ 72 ] {a6e69f47-9585-4784-a6c2-bb9722fa23d0} <Error> executeQuery: Code: 60, e.displayText() = DB::Exception: Table default.migrations_local doesn't exist. (version 20.3.9.70 (official build)) (from 172.20.0.6:47290) (in query: SELECT group, migration_id, status FROM migrations_local FINAL WHERE group IN ('system', 'events', 'transactions', 'discover', 'outcomes', 'metrics', 'sessions', 'profiles', 'functions', 'replays', 'generic_metrics', 'search_issues')), Stack trace (when copying this message, always include the lines below):
0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x105351b0 in /usr/bin/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x8f4172d in /usr/bin/clickhouse
2. DB::Context::getTableImpl(DB::StorageID const&, std::__1::optional<DB::Exception>*) const @ 0xcfe2a24 in /usr/bin/clickhouse
3. DB::Context::getTable(DB::StorageID const&) const @ 0xcfe2bbb in /usr/bin/clickhouse
4. DB::Context::getTable(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) const @ 0xcfe2c7d in /usr/bin/clickhouse
5. DB::JoinedTables::getLeftTableStorage() @ 0xd454892 in /usr/bin/clickhouse
6. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::__1::shared_ptr<DB::IAST> const&, DB::Context const&, std::__1::shared_ptr<DB::IBlockInputStream> const&, std::__1::optional<DB::Pipe>, std::__1::shared_ptr<DB::IStorage> const&, DB::SelectQueryOptions const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) @ 0xd13b6d1 in /usr/bin/clickhouse
7. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::__1::shared_ptr<DB::IAST> const&, DB::Context const&, DB::SelectQueryOptions const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) @ 0xd13c619 in /usr/bin/clickhouse
8. DB::InterpreterSelectWithUnionQuery::InterpreterSelectWithUnionQuery(std::__1::shared_ptr<DB::IAST> const&, DB::Context const&, DB::SelectQueryOptions const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) @ 0xd341686 in /usr/bin/clickhouse
9. DB::InterpreterFactory::get(std::__1::shared_ptr<DB::IAST>&, DB::Context&, DB::QueryProcessingStage::Enum) @ 0xd0909b4 in /usr/bin/clickhouse
10. ? @ 0xd550655 in /usr/bin/clickhouse
11. DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool, bool) @ 0xd553441 in /usr/bin/clickhouse
12. DB::TCPHandler::runImpl() @ 0x9024489 in /usr/bin/clickhouse
13. DB::TCPHandler::run() @ 0x9025470 in /usr/bin/clickhouse
14. Poco::Net::TCPServerConnection::start() @ 0xe3ac69b in /usr/bin/clickhouse
15. Poco::Net::TCPServerDispatcher::run() @ 0xe3acb1d in /usr/bin/clickhouse
16. Poco::PooledThread::run() @ 0x105c3317 in /usr/bin/clickhouse
17. Poco::ThreadImpl::runnableEntry(void*) @ 0x105bf11c in /usr/bin/clickhouse
18. ? @ 0x105c0abd in /usr/bin/clickhouse
19. start_thread @ 0x76db in /lib/x86_64-linux-gnu/libpthread-2.27.so
20. __clone @ 0x12188f in /lib/x86_64-linux-gnu/libc-2.27.so
Event ID
No response
Were you able to get replays working on a previous version of Sentry, or is this the first version where you're seeing this problem?
Were you able to get replays working on a previous version of Sentry, or is this the first version where you're seeing this problem?
I have used version 23.7.0, and this problem occurred after about a week. So I upgraded to 23.8.0, but this problem occurred again after 24 hours.
Now the problem is a bit different, even after rebooting, the Replays won't show anything new, it keeps displaying the content from 2 days ago.
I think I see this same issue where replays stop appearing after some time - restarting all containers fixes it - after the restart, replays which were previously missing now appear. For example I just noticed my most recent replays are all from 5hrs ago, so I restarted all containers and now I see replays from the past 5 hours have now all appeared. I'm yet to check to see if there's a correlation in the logs though, or to selectively restart any containers to isolate the issue. This has happened 2 or 3 times now on a new install which is about a week old. I did have to wait maybe 5 min for them all to appear, so perhaps things were queued and required some processing
I think I see this same issue where replays stop appearing after some time - restarting all containers fixes it - after the restart, containers which were previously missing now appear. For example I just noticed my most recent replays are all from 5hrs ago, so I restarted all containers and now I see replays from the past 5 hours have now all appeared. I'm yet to check to see if there's a correlation in the logs though, or to selectively restart any containers to isolate the issue. This has happened 2 or 3 times now on a new install which is about a week old. I did have to wait maybe 5 min for them all to appear, so perhaps things were queued and required some processing
I encountered the situation you mentioned before, but now even after restart, I can't see any new data.
I think I see this same issue where replays stop appearing after some time - restarting all containers fixes it - after the restart, containers which were previously missing now appear
What containers were previously missing, did they crash for some reason?
Table default.migrations_local doesn't exist might be useful here. Wondering if there was a snuba migration that wasn't performed?
@hubertdeng123 my apologies, I mistyped that - I have updated my comment now. In that sentence I said “containers” were missing when I meant “replays” were missing. Now that I have confirmed for sure that the problem reoccurs after some time and a restart fixes it in my case, I will do further diagnosis before restarting next time
Will do! Let us know what might be going wrong if it happens again.
I don't have a lot to add, except that selectively restarting zookeeper, clickhouse didn't fix it, but a restart of all containers again fixed it. This time my most recent replay was 8hrs ago, and after restart I now see dozens in the past 8hrs. In terms of the processing pipeline from receiving the replay payload to showing it in the WebUI, which other containers can/should I inspect or selectively restart next time this happens, so I can start to isolate the issue?
hi @agoddard, if zookeeper/clickhouse don't seem to be the cause I'd recommend looking at the kafka container to see if restarting that container specifically fixes the issue (and also checking the kafka logs for anything when the issue occurs).
@bmckerry when the issue occurs, the most recent Kafka container log (issue occurred around the same time as this log too) is a log cleaner message with additional detail (shown below). Usually the log cleaner messages are just 1 line, but this one is more detailed, and also looks like it deleted quite a lot, though I don't know if this is important. Also printed below is an error message from sentry-self-hosted-snuba-replays-consumer-1 so next time it happens I will try selectively restarting that container.
Restarting the Kafka, clickhouse and zookeeper containers doesn't seem to solve the issue, but it's fixed again by a full docker compose restart. 4 days worth of replays processed in the ~5min following docker-compose restart
[2023-09-11 03:24:12,212] INFO [kafka-log-cleaner-thread-0]:
Log cleaner thread 0 cleaned log __consumer_offsets-0 (dirty section = [213401, 213401])
28.4 MB of log processed in 0.4 seconds (65.9 MB/sec).
Indexed 28.4 MB in 0.3 seconds (104.4 Mb/sec, 63.1% of total time)
Buffer utilization: 0.0%
Cleaned 28.4 MB in 0.2 seconds (178.6 Mb/sec, 36.9% of total time)
Start size: 28.4 MB (218,698 messages)
End size: 0.0 MB (58 messages)
100.0% size reduction (100.0% fewer messages)
(kafka.log.LogCleaner)
Potentially relevant logs might from sentry-self-hosted-snuba-replays-consumer-1:
2023-09-13 18:05:57,384 Caught exception, shutting down...
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 126, in getresponse
rv = real_getresponse(self, *args, **kwargs)
File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
response.begin()
File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.8/http/client.py", line 285, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 288, in run
self._run_once()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 365, in _run_once
self.__processing_strategy.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/guard.py", line 101, in poll
self.__inner_strategy.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/run_task.py", line 55, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/guard.py", line 37, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/reduce.py", line 149, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/run_task_in_threads.py", line 98, in poll
result = future.result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/src/snuba/snuba/consumers/strategy_factory.py", line 122, in flush_batch
message.payload.close()
File "/usr/src/snuba/snuba/consumers/consumer.py", line 330, in close
self.__insert_batch_writer.close()
File "/usr/src/snuba/snuba/consumers/consumer.py", line 160, in close
self.__writer.write(
File "/usr/src/snuba/snuba/clickhouse/http.py", line 347, in write
batch.join(timeout=batch_join_timeout)
File "/usr/src/snuba/snuba/clickhouse/http.py", line 239, in join
response = self._result.result(timeout)
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 787, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 126, in getresponse
rv = real_getresponse(self, *args, **kwargs)
File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
response.begin()
File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.8/http/client.py", line 285, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
2023-09-13 18:05:57,394 Closing <arroyo.backends.kafka.consumer.KafkaConsumer object at 0x7fd4212aea00>...
2023-09-13 18:05:57,394 Partitions to revoke: [Partition(topic=Topic(name='ingest-replay-events'), index=0)]
2023-09-13 18:05:57,394 Partition revocation complete.
2023-09-13 18:05:57,395 Processor terminated
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 126, in getresponse
rv = real_getresponse(self, *args, **kwargs)
File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
response.begin()
File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.8/http/client.py", line 285, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/snuba", line 33, in <module>
sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/src/snuba/snuba/cli/consumer.py", line 260, in consumer
consumer.run()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 288, in run
self._run_once()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 365, in _run_once
self.__processing_strategy.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/guard.py", line 101, in poll
self.__inner_strategy.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/run_task.py", line 55, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/guard.py", line 37, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/reduce.py", line 149, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/run_task_in_threads.py", line 98, in poll
result = future.result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/src/snuba/snuba/consumers/strategy_factory.py", line 122, in flush_batch
message.payload.close()
File "/usr/src/snuba/snuba/consumers/consumer.py", line 330, in close
self.__insert_batch_writer.close()
File "/usr/src/snuba/snuba/consumers/consumer.py", line 160, in close
self.__writer.write(
File "/usr/src/snuba/snuba/clickhouse/http.py", line 347, in write
batch.join(timeout=batch_join_timeout)
File "/usr/src/snuba/snuba/clickhouse/http.py", line 239, in join
response = self._result.result(timeout)
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 787, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 126, in getresponse
rv = real_getresponse(self, *args, **kwargs)
File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
response.begin()
File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.8/http/client.py", line 285, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
Restarting the Kafka, clickhouse and zookeeper containers doesn't seem to solve the issue, but it's fixed again by a full docker compose restart. 4 days worth of replays processed in the ~5min following docker-compose restart
Interesting. What does your CPU/RAM usage look like out of curiousity?
@hubertdeng123 super chill during normal operation. load avg ~0.5, host using 8 of 32GB ram (I'll see if I can check how much docker/sentry is using), but it doesn't seem to be sweating at all. no spikes in ram, IO, load during the approx window when it last failed.
Got it, seems strange since it seems like a connection error is being thrown. All the containers are up and running, so there aren't any crashes?
@hubertdeng123 it's back in the stale replay state, CPU, load, memory are all fine - all containers are up and no crashes. I can leave it in this state for additional troubleshooting
I've noticed the same issues as @agoddard. Replays show up in the UI fine for a while, then eventually stop. This is on release 23.7.2, but observed the same issue going back to the first self-hosted release that included replays.
Running docker compose logs snuba-replays-consumer, I saw the same http.client.RemoteDisconnected: Remote end closed connection without response @agoddard is seeing.
I found running docker compose restart snuba-replays-consumer would make replays process and start showing up in the UI.
thanks @fpotter I just tested restarting snuba-replays-consumer and I can confirm that fixes it for me too. Judging by the logs from that container, it's showing errors which seem to match the times when I had the issues, I'm not sure how I missed those errors in my prior searches - it looks it disconnects (from Kafka?) and then never recovers until a container restart. I believe the "shutdown signaled" messages are when I intentionally restarted the container(s) to recover from the issue.
2023-09-08 17:42:41,804 Partition revocation complete.
2023-09-08 17:42:42,814 New partitions assigned: {Partition(topic=Topic(name='ingest-replay-events'), index=0): 41141}
2023-09-08 17:42:47,806 Partitions to revoke: [Partition(topic=Topic(name='ingest-replay-events'), index=0)]
2023-09-08 17:42:47,806 Closing <arroyo.processing.strategies.guard.StrategyGuard object at 0x7fd42108ffd0>...
2023-09-08 17:42:47,806 Waiting for <arroyo.processing.strategies.guard.StrategyGuard object at 0x7fd42108ffd0> to exit...
2023-09-08 17:42:47,806 <arroyo.processing.strategies.guard.StrategyGuard object at 0x7fd42108ffd0> exited successfully, releasing assignment.
2023-09-08 17:42:47,806 Partition revocation complete.
2023-09-08 17:42:48,819 New partitions assigned: {Partition(topic=Topic(name='ingest-replay-events'), index=0): 41141}
2023-09-13 18:05:57,384 Caught exception, shutting down...
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 126, in getresponse
rv = real_getresponse(self, *args, **kwargs)
File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
response.begin()
File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.8/http/client.py", line 285, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 288, in run
self._run_once()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 365, in _run_once
self.__processing_strategy.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/guard.py", line 101, in poll
self.__inner_strategy.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/run_task.py", line 55, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/guard.py", line 37, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/reduce.py", line 149, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/run_task_in_threads.py", line 98, in poll
result = future.result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/src/snuba/snuba/consumers/strategy_factory.py", line 122, in flush_batch
message.payload.close()
File "/usr/src/snuba/snuba/consumers/consumer.py", line 330, in close
self.__insert_batch_writer.close()
File "/usr/src/snuba/snuba/consumers/consumer.py", line 160, in close
self.__writer.write(
File "/usr/src/snuba/snuba/clickhouse/http.py", line 347, in write
batch.join(timeout=batch_join_timeout)
File "/usr/src/snuba/snuba/clickhouse/http.py", line 239, in join
response = self._result.result(timeout)
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 787, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 126, in getresponse
rv = real_getresponse(self, *args, **kwargs)
File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
response.begin()
File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.8/http/client.py", line 285, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
2023-09-13 18:05:57,394 Closing <arroyo.backends.kafka.consumer.KafkaConsumer object at 0x7fd4212aea00>...
2023-09-13 18:05:57,394 Partitions to revoke: [Partition(topic=Topic(name='ingest-replay-events'), index=0)]
2023-09-13 18:05:57,394 Partition revocation complete.
2023-09-13 18:05:57,395 Processor terminated
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 126, in getresponse
rv = real_getresponse(self, *args, **kwargs)
File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
response.begin()
File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.8/http/client.py", line 285, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/snuba", line 33, in <module>
sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/src/snuba/snuba/cli/consumer.py", line 260, in consumer
consumer.run()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 288, in run
self._run_once()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 365, in _run_once
self.__processing_strategy.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/guard.py", line 101, in poll
self.__inner_strategy.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/run_task.py", line 55, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/guard.py", line 37, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/reduce.py", line 149, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/run_task_in_threads.py", line 98, in poll
result = future.result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/src/snuba/snuba/consumers/strategy_factory.py", line 122, in flush_batch
message.payload.close()
File "/usr/src/snuba/snuba/consumers/consumer.py", line 330, in close
self.__insert_batch_writer.close()
File "/usr/src/snuba/snuba/consumers/consumer.py", line 160, in close
self.__writer.write(
File "/usr/src/snuba/snuba/clickhouse/http.py", line 347, in write
batch.join(timeout=batch_join_timeout)
File "/usr/src/snuba/snuba/clickhouse/http.py", line 239, in join
response = self._result.result(timeout)
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 787, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 126, in getresponse
rv = real_getresponse(self, *args, **kwargs)
File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
response.begin()
File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.8/http/client.py", line 285, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
%3|1694936651.474|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/1001: Connect to ipv4#172.18.0.11:9092 failed: Connection refused (after 0ms in state CONNECT)
%3|1694936652.474|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/1001: Connect to ipv4#172.18.0.11:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
%3|1694937232.477|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/1001: Connect to ipv4#172.18.0.11:9092 failed: Connection refused (after 0ms in state CONNECT)
%3|1694937233.477|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/1001: Connect to ipv4#172.18.0.11:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
%3|1694937236.635|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/1001: Failed to resolve 'kafka:9092': Name or service not known (after 157ms in state CONNECT)
2023-09-17 07:53:57,466 Shutdown signalled
%3|1694937238.563|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/1001: Connect to ipv4#172.18.0.11:9092 failed: Connection refused (after 86ms in state CONNECT)
2023-09-17 07:54:19,587 Initializing Snuba...
2023-09-17 07:54:39,127 Snuba initialization took 19.55938357487321s
2023-09-17 07:54:40,272 Initializing Snuba...
2023-09-17 07:54:47,690 Snuba initialization took 7.420066382735968s
2023-09-17 07:54:47,702 Consumer Starting
2023-09-17 07:54:47,702 Checking Clickhouse connections
2023-09-17 07:54:47,711 librdkafka log level: 6
2023-09-17 07:55:19,489 New partitions assigned: {Partition(topic=Topic(name='ingest-replay-events'), index=0): 77317}
2023-09-17 07:55:27,848 Connection pool is full, discarding connection: clickhouse. Connection pool size: 1
2023-09-17 07:55:31,608 Connection pool is full, discarding connection: clickhouse. Connection pool size: 1
2023-09-17 07:55:33,258 Connection pool is full, discarding connection: clickhouse. Connection pool size: 1
2023-09-18 09:21:06,939 Partitions to revoke: [Partition(topic=Topic(name='ingest-replay-events'), index=0)]
2023-09-18 09:21:06,939 Closing <arroyo.processing.strategies.guard.StrategyGuard object at 0x7f31b291f250>...
2023-09-18 09:21:06,940 Waiting for <arroyo.processing.strategies.guard.StrategyGuard object at 0x7f31b291f250> to exit...
2023-09-18 09:21:06,955 <arroyo.processing.strategies.guard.StrategyGuard object at 0x7f31b291f250> exited successfully, releasing assignment.
2023-09-18 09:21:06,955 Partition revocation complete.
2023-09-18 09:21:08,291 New partitions assigned: {Partition(topic=Topic(name='ingest-replay-events'), index=0): 104727}
2023-09-18 09:21:12,936 Partitions to revoke: [Partition(topic=Topic(name='ingest-replay-events'), index=0)]
2023-09-18 09:21:12,937 Closing <arroyo.processing.strategies.guard.StrategyGuard object at 0x7f31b94adb80>...
2023-09-18 09:21:12,937 Waiting for <arroyo.processing.strategies.guard.StrategyGuard object at 0x7f31b94adb80> to exit...
2023-09-18 09:21:12,960 <arroyo.processing.strategies.guard.StrategyGuard object at 0x7f31b94adb80> exited successfully, releasing assignment.
2023-09-18 09:21:12,960 Partition revocation complete.
2023-09-18 09:21:14,293 New partitions assigned: {Partition(topic=Topic(name='ingest-replay-events'), index=0): 104728}
2023-09-19 08:39:25,005 Partitions to revoke: [Partition(topic=Topic(name='ingest-replay-events'), index=0)]
2023-09-19 08:39:25,005 Closing <arroyo.processing.strategies.guard.StrategyGuard object at 0x7f31b25db550>...
2023-09-19 08:39:25,005 Waiting for <arroyo.processing.strategies.guard.StrategyGuard object at 0x7f31b25db550> to exit...
2023-09-19 08:39:25,006 <arroyo.processing.strategies.guard.StrategyGuard object at 0x7f31b25db550> exited successfully, releasing assignment.
2023-09-19 08:39:25,006 Partition revocation complete.
2023-09-19 08:39:25,120 New partitions assigned: {Partition(topic=Topic(name='ingest-replay-events'), index=0): 112729}
2023-09-19 08:39:31,008 Partitions to revoke: [Partition(topic=Topic(name='ingest-replay-events'), index=0)]
2023-09-19 08:39:31,009 Closing <arroyo.processing.strategies.guard.StrategyGuard object at 0x7f31b54b5220>...
2023-09-19 08:39:31,009 Waiting for <arroyo.processing.strategies.guard.StrategyGuard object at 0x7f31b54b5220> to exit...
2023-09-19 08:39:31,010 <arroyo.processing.strategies.guard.StrategyGuard object at 0x7f31b54b5220> exited successfully, releasing assignment.
2023-09-19 08:39:31,011 Partition revocation complete.
2023-09-19 08:39:31,126 New partitions assigned: {Partition(topic=Topic(name='ingest-replay-events'), index=0): 112730}
2023-09-21 10:42:15,886 Caught exception, shutting down...
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 126, in getresponse
rv = real_getresponse(self, *args, **kwargs)
File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
response.begin()
File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.8/http/client.py", line 285, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 288, in run
self._run_once()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 365, in _run_once
self.__processing_strategy.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/guard.py", line 101, in poll
self.__inner_strategy.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/run_task.py", line 55, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/guard.py", line 37, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/reduce.py", line 149, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/run_task_in_threads.py", line 98, in poll
result = future.result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/src/snuba/snuba/consumers/strategy_factory.py", line 122, in flush_batch
message.payload.close()
File "/usr/src/snuba/snuba/consumers/consumer.py", line 330, in close
self.__insert_batch_writer.close()
File "/usr/src/snuba/snuba/consumers/consumer.py", line 160, in close
self.__writer.write(
File "/usr/src/snuba/snuba/clickhouse/http.py", line 347, in write
batch.join(timeout=batch_join_timeout)
File "/usr/src/snuba/snuba/clickhouse/http.py", line 239, in join
response = self._result.result(timeout)
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 787, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 126, in getresponse
rv = real_getresponse(self, *args, **kwargs)
File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
response.begin()
File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.8/http/client.py", line 285, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
2023-09-21 10:42:15,904 Closing <arroyo.backends.kafka.consumer.KafkaConsumer object at 0x7f31b2911a30>...
2023-09-21 10:42:15,905 Partitions to revoke: [Partition(topic=Topic(name='ingest-replay-events'), index=0)]
2023-09-21 10:42:15,905 Partition revocation complete.
2023-09-21 10:42:15,907 Processor terminated
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 126, in getresponse
rv = real_getresponse(self, *args, **kwargs)
File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
response.begin()
File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.8/http/client.py", line 285, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/snuba", line 33, in <module>
sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/src/snuba/snuba/cli/consumer.py", line 260, in consumer
consumer.run()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 288, in run
self._run_once()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 365, in _run_once
self.__processing_strategy.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/guard.py", line 101, in poll
self.__inner_strategy.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/run_task.py", line 55, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/guard.py", line 37, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/reduce.py", line 149, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/run_task_in_threads.py", line 98, in poll
result = future.result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/src/snuba/snuba/consumers/strategy_factory.py", line 122, in flush_batch
message.payload.close()
File "/usr/src/snuba/snuba/consumers/consumer.py", line 330, in close
self.__insert_batch_writer.close()
File "/usr/src/snuba/snuba/consumers/consumer.py", line 160, in close
self.__writer.write(
File "/usr/src/snuba/snuba/clickhouse/http.py", line 347, in write
batch.join(timeout=batch_join_timeout)
File "/usr/src/snuba/snuba/clickhouse/http.py", line 239, in join
response = self._result.result(timeout)
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 787, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 126, in getresponse
rv = real_getresponse(self, *args, **kwargs)
File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
response.begin()
File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.8/http/client.py", line 285, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
2023-09-25 06:08:15,752 Shutdown signalled
2023-09-25 06:08:26,740 Initializing Snuba...
Restarting sentry-self-hosted-snuba-replays-consumer-1 can indeed temporarily fix it.
Self-Hosted Version 23.9.1 sentry-self-hosted-snuba-replays-consumer-1 logs
2023-09-26 02:47:43,844 Closing <arroyo.processing.strategies.guard.StrategyGuard object at 0x7fb2d7052b80>...
2023-09-26 02:47:43,845 Waiting for <arroyo.processing.strategies.guard.StrategyGuard object at 0x7fb2d7052b80> to exit...
2023-09-26 02:47:43,845 <arroyo.processing.strategies.guard.StrategyGuard object at 0x7fb2d7052b80> exited successfully, releasing assignment.
2023-09-26 02:47:43,845 Partition revocation complete.
2023-09-26 02:47:45,280 New partitions assigned: {Partition(topic=Topic(name='ingest-replay-events'), index=0): 36642}
2023-09-26 02:47:49,845 Partitions to revoke: [Partition(topic=Topic(name='ingest-replay-events'), index=0)]
2023-09-26 02:47:49,846 Closing <arroyo.processing.strategies.guard.StrategyGuard object at 0x7fb2d7052790>...
2023-09-26 02:47:49,846 Waiting for <arroyo.processing.strategies.guard.StrategyGuard object at 0x7fb2d7052790> to exit...
2023-09-26 02:47:49,846 <arroyo.processing.strategies.guard.StrategyGuard object at 0x7fb2d7052790> exited successfully, releasing assignment.
2023-09-26 02:47:49,846 Partition revocation complete.
2023-09-26 02:47:50,061 New partitions assigned: {Partition(topic=Topic(name='ingest-replay-events'), index=0): 36643}
2023-09-26 04:58:29,909 Partitions to revoke: [Partition(topic=Topic(name='ingest-replay-events'), index=0)]
2023-09-26 04:58:29,909 Closing <arroyo.processing.strategies.guard.StrategyGuard object at 0x7fb2d7052dc0>...
2023-09-26 04:58:29,909 Waiting for <arroyo.processing.strategies.guard.StrategyGuard object at 0x7fb2d7052dc0> to exit...
2023-09-26 04:58:29,918 <arroyo.processing.strategies.guard.StrategyGuard object at 0x7fb2d7052dc0> exited successfully, releasing assignment.
2023-09-26 04:58:29,918 Partition revocation complete.
2023-09-26 04:58:31,223 New partitions assigned: {Partition(topic=Topic(name='ingest-replay-events'), index=0): 41129}
2023-09-26 04:58:35,910 Partitions to revoke: [Partition(topic=Topic(name='ingest-replay-events'), index=0)]
2023-09-26 04:58:35,910 Closing <arroyo.processing.strategies.guard.StrategyGuard object at 0x7fb2d7052a60>...
2023-09-26 04:58:35,910 Waiting for <arroyo.processing.strategies.guard.StrategyGuard object at 0x7fb2d7052a60> to exit...
2023-09-26 04:58:35,917 <arroyo.processing.strategies.guard.StrategyGuard object at 0x7fb2d7052a60> exited successfully, releasing assignment.
2023-09-26 04:58:35,917 Partition revocation complete.
2023-09-26 04:58:36,179 New partitions assigned: {Partition(topic=Topic(name='ingest-replay-events'), index=0): 41134}
2023-09-26 11:11:39,503 Caught exception, shutting down...
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 126, in getresponse
rv = real_getresponse(self, *args, **kwargs)
File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
response.begin()
File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.8/http/client.py", line 285, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 288, in run
self._run_once()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 368, in _run_once
self.__processing_strategy.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/guard.py", line 101, in poll
self.__inner_strategy.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/run_task.py", line 55, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/guard.py", line 37, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/reduce.py", line 149, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/run_task_in_threads.py", line 107, in poll
result = future.result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/src/snuba/snuba/consumers/strategy_factory.py", line 122, in flush_batch
message.payload.close()
File "/usr/src/snuba/snuba/consumers/consumer.py", line 330, in close
self.__insert_batch_writer.close()
File "/usr/src/snuba/snuba/consumers/consumer.py", line 160, in close
self.__writer.write(
File "/usr/src/snuba/snuba/clickhouse/http.py", line 347, in write
batch.join(timeout=batch_join_timeout)
File "/usr/src/snuba/snuba/clickhouse/http.py", line 239, in join
response = self._result.result(timeout)
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 787, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 126, in getresponse
rv = real_getresponse(self, *args, **kwargs)
File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
response.begin()
File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.8/http/client.py", line 285, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
2023-09-26 11:11:39,514 Closing <arroyo.backends.kafka.consumer.KafkaConsumer object at 0x7fb2d70522b0>...
2023-09-26 11:11:39,516 Partitions to revoke: [Partition(topic=Topic(name='ingest-replay-events'), index=0)]
2023-09-26 11:11:39,516 Partition revocation complete.
2023-09-26 11:11:39,520 Processor terminated
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 126, in getresponse
rv = real_getresponse(self, *args, **kwargs)
File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
response.begin()
File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.8/http/client.py", line 285, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/snuba", line 33, in <module>
sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/src/snuba/snuba/cli/consumer.py", line 260, in consumer
consumer.run()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 288, in run
self._run_once()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/processor.py", line 368, in _run_once
self.__processing_strategy.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/guard.py", line 101, in poll
self.__inner_strategy.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/run_task.py", line 55, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/guard.py", line 37, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/reduce.py", line 149, in poll
self.__next_step.poll()
File "/usr/local/lib/python3.8/site-packages/arroyo/processing/strategies/run_task_in_threads.py", line 107, in poll
result = future.result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/src/snuba/snuba/consumers/strategy_factory.py", line 122, in flush_batch
message.payload.close()
File "/usr/src/snuba/snuba/consumers/consumer.py", line 330, in close
self.__insert_batch_writer.close()
File "/usr/src/snuba/snuba/consumers/consumer.py", line 160, in close
self.__writer.write(
File "/usr/src/snuba/snuba/clickhouse/http.py", line 347, in write
batch.join(timeout=batch_join_timeout)
File "/usr/src/snuba/snuba/clickhouse/http.py", line 239, in join
response = self._result.result(timeout)
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 787, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 126, in getresponse
rv = real_getresponse(self, *args, **kwargs)
File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
response.begin()
File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.8/http/client.py", line 285, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
In case this is useful for others, my band-aid fix for this has been to auto-restart the snuba-consumers-replay service with a cronjob:
*/1 * * * * (cd path/to/your/sentry/checkout && docker compose logs --tail=2 snuba-replays-consumer | grep -F 'Remote end closed connection without response' && docker compose restart snuba-replays-consumer || echo "No need to restart.") 2>&1 | systemd-cat -t replays-restart
You can view the logs from the job with:
journalctl -t replays-restart
In case this is useful for others, my band-aid fix for this has been to auto-restart the
snuba-consumers-replayservice with a cronjob:*/1 * * * * (cd path/to/your/sentry/checkout && docker compose logs --tail=2 snuba-replays-consumer | grep -F 'Remote end closed connection without response' && docker compose restart snuba-replays-consumer || echo "No need to restart.") 2>&1 | systemd-cat -t replays-restartYou can view the logs from the job with:
journalctl -t replays-restart
Same exact issue happening to me. Doing this helps but yeah there is def. something else going on
I'm currently running into this on a self-hosted instance running 24.1.0 using the EmberJS SDK. Was a resolution ever discovered?
We have not found a resolution yet, thanks for your patience.