clickhouse-operator
clickhouse-operator copied to clipboard
Keeper loses quorum when the number of replicas increases.
Trying to update ch-keeper to 24.12-alpine and then i increased the replica number after update everything went smoothly but after the increase from 1 to 3 - the quorum was lost and chi went to RO
i deleted pod+pvc+pv and made "system replica restore" operation
system replica restore - was the cure but only after dettach/attach tables
i got to get keepers logs with thing like below:
2025-03-18T12:38:14.386+03:00 2025.03.18 09:38:14.386856 [ 50 ] {} <Fatal> BaseDaemon: ########## Short fault info ############
2025-03-18T12:38:14.386+03:00 2025.03.18 09:38:14.386905 [ 50 ] {} <Fatal> BaseDaemon: (version 24.12.6.70 (official build), build id: 4C8B262CAB368B1364CE42907311068D6DCA5A0E, git hash: 834cccbc6e8c7f0db21ec500394c3a567b83482f, architecture: x86_64) (from thread 42) Received signal 11
2025-03-18T12:38:14.386+03:00 2025.03.18 09:38:14.386926 [ 50 ] {} <Fatal> BaseDaemon: Signal description: Segmentation fault
2025-03-18T12:38:14.386+03:00 2025.03.18 09:38:14.386942 [ 50 ] {} <Fatal> BaseDaemon: Address: 0x10. Access: read. Address not mapped to object.
2025-03-18T12:38:14.386+03:00 2025.03.18 09:38:14.386958 [ 50 ] {} <Fatal> BaseDaemon: Stack trace: 0x000000000ad02ad5 0x00007f1463284520 0x00000000100af9b1 0x000000001007456c 0x000000001005c207 0x000000001005d01f 0x0000000010068b10 0x0000000010069539 0x000000001001851e 0x0000000010010d1d 0x000000001001e7a3 0x000000000abda362 0x000000000abe007a 0x00007f14632d6ac3 0x00007f1463367a04
2025-03-18T12:38:14.386+03:00 2025.03.18 09:38:14.386972 [ 50 ] {} <Fatal> BaseDaemon: ########################################
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.386981 [ 50 ] {} <Fatal> BaseDaemon: (version 24.12.6.70 (official build), build id: 4C8B262CAB368B1364CE42907311068D6DCA5A0E, git hash: 834cccbc6e8c7f0db21ec500394c3a567b83482f) (from thread 42) (no query) Received signal Segmentation fault (11)
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.386994 [ 50 ] {} <Fatal> BaseDaemon: Address: 0x10. Access: read. Address not mapped to object.
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.387001 [ 50 ] {} <Fatal> BaseDaemon: Stack trace: 0x000000000ad02ad5 0x00007f1463284520 0x00000000100af9b1 0x000000001007456c 0x000000001005c207 0x000000001005d01f 0x0000000010068b10 0x0000000010069539 0x000000001001851e 0x0000000010010d1d 0x000000001001e7a3 0x000000000abda362 0x000000000abe007a 0x00007f14632d6ac3 0x00007f1463367a04
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.387067 [ 50 ] {} <Fatal> BaseDaemon: 0. signalHandler(int, siginfo_t*, void*) @ 0x000000000ad02ad5
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.387081 [ 50 ] {} <Fatal> BaseDaemon: 1. ? @ 0x00007f1463284520
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.387100 [ 50 ] {} <Fatal> BaseDaemon: 2. nuraft::raft_server::handle_append_entries(nuraft::req_msg&) @ 0x00000000100af9b1
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.387121 [ 50 ] {} <Fatal> BaseDaemon: 3. nuraft::raft_server::process_req(nuraft::req_msg&, nuraft::raft_server::req_ext_params const&) @ 0x000000001007456c
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.387141 [ 50 ] {} <Fatal> BaseDaemon: 4. nuraft::rpc_session::read_complete(std::shared_ptr<nuraft::buffer>, std::shared_ptr<nuraft::buffer>) @ 0x000000001005c207
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.387159 [ 50 ] {} <Fatal> BaseDaemon: 5. nuraft::rpc_session::read_log_data(std::shared_ptr<nuraft::buffer>, boost::system::error_code const&, unsigned long) @ 0x000000001005d01f
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.387190 [ 50 ] {} <Fatal> BaseDaemon: 6. boost::asio::detail::read_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor>, boost::asio::mutable_buffers_1, boost::asio::mutable_buffer const*, boost::asio::detail::transfer_all_t, std::__bind<void (nuraft::rpc_session::*)(std::shared_ptr<nuraft::buffer>, boost::system::error_code const&, unsigned long), std::shared_ptr<nuraft::rpc_session> const&, std::shared_ptr<nuraft::buffer>&, std::placeholders::__ph<1> const&, std::placeholders::__ph<2> const&>>::operator()(boost::system::error_code, unsigned long, int) @ 0x0000000010068b10
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.387224 [ 50 ] {} <Fatal> BaseDaemon: 7. boost::asio::detail::reactive_socket_recv_op<boost::asio::mutable_buffers_1, boost::asio::detail::read_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor>, boost::asio::mutable_buffers_1, boost::asio::mutable_buffer const*, boost::asio::detail::transfer_all_t, std::__bind<void (nuraft::rpc_session::*)(std::shared_ptr<nuraft::buffer>, boost::system::error_code const&, unsigned long), std::shared_ptr<nuraft::rpc_session> const&, std::shared_ptr<nuraft::buffer>&, std::placeholders::__ph<1> const&, std::placeholders::__ph<2> const&>>, boost::asio::any_io_executor>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) @ 0x0000000010069539
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.387247 [ 50 ] {} <Fatal> BaseDaemon: 8. boost::asio::detail::scheduler::run(boost::system::error_code&) @ 0x000000001001851e
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.387264 [ 50 ] {} <Fatal> BaseDaemon: 9. nuraft::asio_service_impl::worker_entry() @ 0x0000000010010d1d
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.387292 [ 50 ] {} <Fatal> BaseDaemon: 10. void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<std::__bind<void (nuraft::asio_service_impl::*)(), nuraft::asio_service_impl*>>(std::__bind<void (nuraft::asio_service_impl::*)(), nuraft::asio_service_impl*>&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x000000001001e7a3
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.387312 [ 50 ] {} <Fatal> BaseDaemon: 11. ThreadPoolImpl<std::thread>::ThreadFromThreadPool::worker() @ 0x000000000abda362
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.387334 [ 50 ] {} <Fatal> BaseDaemon: 12. void* std::__thread_proxy[abi:v15007]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x000000000abe007a
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.387345 [ 50 ] {} <Fatal> BaseDaemon: 13. ? @ 0x00007f14632d6ac3
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.387352 [ 50 ] {} <Fatal> BaseDaemon: 14. ? @ 0x00007f1463367a04
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.387360 [ 50 ] {} <Fatal> BaseDaemon: Integrity check of the executable skipped because the reference checksum could not be read.
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.387370 [ 50 ] {} <Information> SentryWriter: Not sending crash report
2025-03-18T12:38:14.387+03:00 2025.03.18 09:38:14.387386 [ 50 ] {} <Fatal> BaseDaemon: Report this error to https://github.com/ClickHouse/ClickHouse/issues