Lock timeout 2 times longer than setting.
TL;DR: The ephemeral lock is timing out 2x the value (in seconds) of lock_timeout_seconds
I'm running into this issue as the experimental result of the lock timeout is actually 2 times longer than the timeout that set when initializing the lock.
My experiment has a basic idea as following: Each host has consul agent running and all connect to the same consul server. Each host has test.py that will be controlled to run from another host.
On the another host, I run like:
while true {
ssh host1 'python test.py'
ssh host2 'python test.py'
ssh host3 'python test.py'
}
The test.py is:
#!/usr/bin/python
import consul
import consul_lock
import sys
import os
import time
from consul_lock import EphemeralLock
LOCK_KEY = "test/key/ephemeral"
# Timeout for the lock
ACQ_TIMEOUT = 500
LOCK_TIMEOUT = 10
# Create lock
consul_lock.defaults.consul_client = consul.Consul(host='localhost', port=int(8500))
LOCK = EphemeralLock(LOCK_KEY, acquire_timeout_ms=ACQ_TIMEOUT, lock_timeout_seconds=LOCK_TIMEOUT)
try:
was_acquired = LOCK.acquire(fail_hard=True)
# when lock acquired successfully, we log the epoch time and the log as LOCK
print str(time.time())
print "LOCK"
except:
print "FAILED"
The result from the above experiment is something like below:
1438189093.83 LOCK FAILED FAILED FAILED FAILED FAILED FAILED FAILED FAILED FAILED FAILED FAILED FAILED FAILED FAILED FAILED FAILED FAILED FAILED FAILED FAILED FAILED FAILED FAILED FAILED FAILED 1438189111.45 LOCK FAILED FAILED FAILED FAILED FAILED FAILED ...........
The full log is too long to show, but the basic idea is that this happen all through the 2000 times that we trying to acquire the lock and let the lock expire after 10 sec - instead of 10 sec, lock expired after 20 sec.
I tried some other time out like 15 and 60 seconds, the following output is produced through the consul key value storage :
#15 seconds lock timeout
{"locked_at": "2015-07-28 22:40:46.304532"}
{"locked_at": "2015-07-28 22:41:21.511981"}
{"locked_at": "2015-07-28 22:41:53.013088"}
{"locked_at": "2015-07-28 22:42:21.489324"}
{"locked_at": "2015-07-28 22:42:54.050118"}
{"locked_at": "2015-07-28 22:43:26.171341"}
#60 seconds lock timeout
{"locked_at": "2015-07-28 23:00:46.736620"}
{"locked_at": "2015-07-28 23:02:45.383535"}
{"locked_at": "2015-07-28 23:04:45.568290"}
{"locked_at": "2015-07-28 23:06:47.542923"}
I wonder if this is some errors or some conflicts between the lock and consul ttl? As the expected timeout should be around the lock_timeout_seconds we set.
I believe that is more or less expected based on the docs and how Consul treats session and lock ttl. But I'd like to elaborate more when I have a chance to give a fuller explanation of how this library could be used. Long story short, I think using the sessions this way may be abusing the Consul API a bit and I'm not confident this approach will perform well under a high volume of traffic.
I'll open another issue to discuss possible performance implications due the current implementation of this library and Consul.
Greetings. In my experience, the sessions indeed live almost 2* the provided TTL. But your code should normally release the locks, and you rely on the sessions only to delete locks if the process died, isn't so?