kazoo icon indicating copy to clipboard operation
kazoo copied to clipboard

How to auto-reconnect when a zk node dies

Open ltagliamonte opened this issue 8 years ago • 1 comments

Hello, I'm trying to test cluster failures using kazoo. I wrote the following script:

from kazoo.client import KazooClient
from kazoo.client import KazooState
from kazoo.client import KazooRetry
import sys
import uuid
import logging

def zk_status_listener(state):
    if state == KazooState.LOST:
        print("Register somewhere that the session was lost")
    elif state == KazooState.SUSPENDED:
        print("Handle being disconnected from Zookeeper")
    else:
        print("Handle being connected/reconnected to Zookeeper")

def test_stress(zk):
    zk.ensure_path("/my/test/stress")
    while True:
        print("create znode")
        zk.create("/my/test/stress/"+str(uuid.uuid4()), str(uuid.uuid4()))

try:
    logging.basicConfig()
    zkr = KazooRetry(max_tries=-1)
    zk = KazooClient(hosts='s1,s2,s3', connection_retry=zkr) #put here the zk nodes part of the ensemble
    zk.add_listener(zk_status_listener)
    zk.start()
    test_stress(zk)

except:
    print(sys.exc_info()[0])

I let the script run and then I kill one of the zk servers and I get the following:

Handle being connected/reconnected to Zookeeper
create znode
....
create znode
create znode
Handle being disconnected from Zookeeper<class 'kazoo.exceptions.ConnectionLoss'>
WARNING:kazoo.client:Connection dropped: socket connection broken
WARNING:kazoo.client:Transition to CONNECTING

The script exits and doesn't reconnects to one of the other available zk nodes. Can somebody show me how to use the retry feature?

ltagliamonte avatar Jul 17 '17 22:07 ltagliamonte

I have the same problem. And I find a way to auto-reconnect. I wrote the resolvent in issues 456.

in connection.py   def _connect_loop(self, retry):
if len(host_ports) == 0:
    return STOP_CONNECTING

return STOP_CONNECTING will stop the reconnect, a RETRY_EXCEPTIONS should be thrown.
eg.
    raise ForceRetryError()

then it will auto-reconnect to zk even all zk node dies.

alhambraGod avatar Jul 18 '17 02:07 alhambraGod