python-consul-lock icon indicating copy to clipboard operation
python-consul-lock copied to clipboard

Lock timeout 2 times longer than setting.

Open franksun007 opened this issue 10 years ago • 2 comments

TL;DR: The ephemeral lock is timing out 2x the value (in seconds) of lock_timeout_seconds

I'm running into this issue as the experimental result of the lock timeout is actually 2 times longer than the timeout that set when initializing the lock.

My experiment has a basic idea as following: Each host has consul agent running and all connect to the same consul server. Each host has test.py that will be controlled to run from another host.

On the another host, I run like:


while true {
     ssh host1 'python test.py'
     ssh host2 'python test.py'
     ssh host3 'python test.py'
}

The test.py is:


 #!/usr/bin/python
 import consul
 import consul_lock
 import sys
 import os
 import time
 from consul_lock import EphemeralLock
 LOCK_KEY = "test/key/ephemeral"
 # Timeout for the lock
 ACQ_TIMEOUT = 500
 LOCK_TIMEOUT = 10
 # Create lock
 consul_lock.defaults.consul_client = consul.Consul(host='localhost', port=int(8500))
 LOCK = EphemeralLock(LOCK_KEY, acquire_timeout_ms=ACQ_TIMEOUT, lock_timeout_seconds=LOCK_TIMEOUT)
 try:
     was_acquired = LOCK.acquire(fail_hard=True)
     # when lock acquired successfully, we log the epoch time and the log as LOCK
     print str(time.time())
     print "LOCK"
 except:
     print "FAILED"

The result from the above experiment is something like below:

1438189093.83
LOCK
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
1438189111.45
LOCK
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
...........

The full log is too long to show, but the basic idea is that this happen all through the 2000 times that we trying to acquire the lock and let the lock expire after 10 sec - instead of 10 sec, lock expired after 20 sec.

I tried some other time out like 15 and 60 seconds, the following output is produced through the consul key value storage :

#15 seconds lock timeout 
{"locked_at": "2015-07-28 22:40:46.304532"}
{"locked_at": "2015-07-28 22:41:21.511981"}
{"locked_at": "2015-07-28 22:41:53.013088"} 
{"locked_at": "2015-07-28 22:42:21.489324"} 
{"locked_at": "2015-07-28 22:42:54.050118"}
{"locked_at": "2015-07-28 22:43:26.171341"}

#60 seconds lock timeout 
{"locked_at": "2015-07-28 23:00:46.736620"}
{"locked_at": "2015-07-28 23:02:45.383535"}
{"locked_at": "2015-07-28 23:04:45.568290"}
{"locked_at": "2015-07-28 23:06:47.542923"}

I wonder if this is some errors or some conflicts between the lock and consul ttl? As the expected timeout should be around the lock_timeout_seconds we set.

franksun007 avatar Jul 29 '15 18:07 franksun007

I believe that is more or less expected based on the docs and how Consul treats session and lock ttl. But I'd like to elaborate more when I have a chance to give a fuller explanation of how this library could be used. Long story short, I think using the sessions this way may be abusing the Consul API a bit and I'm not confident this approach will perform well under a high volume of traffic.

I'll open another issue to discuss possible performance implications due the current implementation of this library and Consul.

kurtome avatar Jul 31 '15 19:07 kurtome

Greetings. In my experience, the sessions indeed live almost 2* the provided TTL. But your code should normally release the locks, and you rely on the sessions only to delete locks if the process died, isn't so?

motty-stratoscale avatar Jun 02 '16 11:06 motty-stratoscale