pgqueue icon indicating copy to clipboard operation
pgqueue copied to clipboard

locking-take occassionally grabs the same item twice

Open weavejester opened this issue 6 years ago • 3 comments

I haven't finished my investigation, but I wanted to confirm that this sort of thing shouldn't be happening. I have 8 threads, which all have the same queue object, and I'm using locking-take to take an item off the queue, which is then deleted on success, and unlocked in a finally clause.

Occassionally, I'm getting the following debug logs:

:take   {:id 14720, :thread "clojure-agent-send-off-pool-0", :time 1575506781782}
:take   {:id 14720, :thread "clojure-agent-send-off-pool-7", :time 1575506781786}
:unlock {:id 14720, :thread "clojure-agent-send-off-pool-7", :time 1575506781821}
:unlock {:id 14720, :thread "clojure-agent-send-off-pool-0", :time 1575506781839}

So in this case thread 0 performs a locking take, and gets item 14720. Shortly after, thread 7 performs a locking take, and also gets item 14720.

My understanding is a locking-take should not be able to take the same item as another thread, if it has not yet been unlocked. Is this correct, or am I misunderstanding what locking-take does?

weavejester avatar Dec 05 '19 01:12 weavejester

It looks like subsequent locking-take calls can have the same database connection, which explains why the advisory lock isn't stopping things:

:take {:id 18385, :thread "pool-10", :time 1575512351984, :db-id 8}
:take {:id 18385, :thread "pool-9",  :time 1575512351989, :db-id 8}

My reading of the source code is that there doesn't appear to be a mechanism to prevent different threads from concurrently using the same database connection. I'm also a little confused as to why the *qlocks* and *db-pool* vars are dynamic, since they are never used dynamically.

I know it's been 2 years since this project has been worked on, but would you be amenable to a PR to fix this?

weavejester avatar Dec 05 '19 03:12 weavejester

@weavejester This looks like a legitimate bug, and I'd be happy to accept a PR for this.

It looks like locking-take and locking-take-batch should check for the existence of the locked item's id not only in the query, but also within *qlocks*.

(Also, I don't recall a reason for the vars being dynamic, and, I think you're right...we don't need :^dynamic here.)

csummers avatar Dec 05 '19 10:12 csummers

Okay, I'll open up a PR of fixes. It looks like the recursive query is also broken.

weavejester avatar Dec 05 '19 17:12 weavejester