Connection timeout is not set to client._session_timeout / 1000 / len(client.hosts) in all places
Expected Behavior
Client should wait for maximum client._session_timeout / 1000 / len(client.hosts) seconds before moving to next zookeeper server.
Actual Behavior
We are using kazoo to connect each of our clients to one of the 3 zookeeper servers that form a cluster. We set timeout=120s when we instantiate kazoo.client.KazooClient class. Therefore, client._session_timeout = 120000.
Below is the code from kazoo that sets client._session_timeout: self._session_timeout = int(timeout * 1000)
Our purpose is to have negotiated session timeout=120s and connection_timeout=120s/3=40s. In some places of kazoo code it is timeout=client._session_timeout / 1000.0, while in some other places it is timeout=client._session_timeout / 1000.0 / len(client.hosts).
In all our zookeeper servers, in /conf/zoo.cfg, we have set maxSessionTimeout=120s. After negotiation, indeed, negotiated_session_timeout is 120s.
We add an iptables rule and we isolate zookeeper leader. The remaining 2 zookeeper servers close the sessions with all the clients and perform leader election. From what I have read, this is the behavior of zookeeper servers when the leader goes down.
kazoo tries to connect to next zookeeper server from the local server list (we shuffle zookeeper server list). Sometimes, kazoo will try to connect to the isolated zookeeper server. When kazoo tries to connect to any zookeeper server, this code will be called: with self._socket_error_handling(): self._socket = self.handler.create_connection( address=(hostip, port), timeout=client._session_timeout / 1000.0, use_ssl=self.client.use_ssl, keyfile=self.client.keyfile, certfile=self.client.certfile, ca=self.client.ca, keyfile_password=self.client.keyfile_password, verify_certs=self.client.verify_certs, )
As we can see, timeout is 120s. kazoo waits for 120s and then moves to another zookeeper server. Our expectation is to wait for 40s.
I have added additional logs in protocol/connection.py, which start with 'anda:' Here I have used dummy values as session ids/passwords.
Logs with logging in DEBUG mode
07/08/2024 03:12:51 AM Using session_id: XXXX session_passwd: YYYY
07/08/2024 03:12:51 AM anda: before create_connection
07/08/2024 03:14:51 AM Connection dropped: socket connection error: None
07/08/2024 03:14:51 AM anda: in _connect_loop after _connect_attempt host =
Logs when I divide timeout value to len(client.hosts). Below is the modified code:
with self._socket_error_handling():
self._socket = self.handler.create_connection(
address=(hostip, port),
timeout=client._session_timeout / 1000.0 / len(client.hosts),
use_ssl=self.client.use_ssl,
keyfile=self.client.keyfile,
certfile=self.client.certfile,
ca=self.client.ca,
keyfile_password=self.client.keyfile_password,
verify_certs=self.client.verify_certs,
)
07/09/2024 04:10:00 AM Connection dropped: socket connection broken
07/09/2024 04:10:00 AM Transition to CONNECTING
07/09/2024 04:10:00 AM Zookeeper connection lost
07/09/2024 04:10:00 AM anda: in _connect_loop after _connect_attempt host =
Specifications
- Kazoo version: 2.7.0
- Zookeeper version: 3.8.4
- Zookeeper configuration: put here any useful ZK configuration (authentication, encryption, number of ZK members, number of (concurrent?) clients, Java version, krb5 version, etc.)
- Python version: Python 2.7
- OS: RHOSP 8.4
Hello. Gentle reminder :) Can someone please take a look? Thank you
Hi there, thanks for reporting this.
This is, in fact, a known issue and we are already working on a fix for it. See #685 which dissociates network connection timeouts from the logical session timeout.
Cheers,
On Mon, Jul 15, 2024, 01:58 Anda Nicolae @.***> wrote:
Hello. Gentle reminder :) Can someone please take a look? Thank you
— Reply to this email directly, view it on GitHub https://github.com/python-zk/kazoo/issues/756#issuecomment-2227815203, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIFTHVTIBLZ2EROZTHNHMTZMNXIHAVCNFSM6AAAAABKSWMDMGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRXHAYTKMRQGM . You are receiving this because you are subscribed to this thread.Message ID: @.***>