[FIX] handle race conditions that could lead to job running twice
Do not wait for locks and start jobs that are not in the expected state.
In a nutshell this PR replaces two SELECT FOR UPDATE by SELECT FOR UPDATE SKIP LOCKED. This is because if the job to run is already locked or not in the expected state there is no need to wait: it means the job is being executed by another worker already.
Also since there is commit between the check that the job is in enqueued state and set started, and the actual start of execution, there was window there for two workers to start the same job in some rare situations. This PR should avoid this case.
Maybe fixes #858
Hi @guewen, some modules you are maintaining are being modified, check this out!
@thomaspaulb if you have a reproducer, for #858 you may want to test this, possibly in combination with #853
BTW, for readability, I think the first part of _try_perform_job (until job.lock()) should be extracted from that function. That would make the flow easier to understand.