After workflow repaired task is executed two times
Describe the bug We notices that task is executed twice sometimes. After we enabled debug logs we found out that after WorkflowRepairService re-queued task for some reason the task was exeucted two times:
INFO 2022-07-04T07:56:38,583 147034 com.netflix.conductor.core.reconciliation.WorkflowRepairService [sweeper-thread-1] Task 425d9c94-dc30-441b-b21b-73ccc5118829 in workflow d6e20f06-c884-4c25-81a4-4a7c0eb3827e re-queued for repairs
DEBUG 2022-07-04T07:56:42,994 151445 com.netflix.conductor.contribs.tasks.http.HttpTask [system-task-worker-1] Response: 200, {bills={partyAUTHOR={biId=5200737, status=OPEN}, partyUNIVERSITY={biId=5200740, status=OPEN}}}, task:425d9c94-dc30-441b-b21b-73ccc5118829
DEBUG 2022-07-04T07:56:42,994 151445 com.netflix.conductor.contribs.tasks.http.HttpTask [system-task-worker-0] Response: 200, {bills={partyAUTHOR={biId=5200738, status=OPEN}, partyUNIVERSITY={biId=5200739, status=OPEN}}}, task:425d9c94-dc30-441b-b21b-73ccc5118829
What does WorkflowRepairService do and do we need it at all? Why does it happen even when we have lock service? Thanks.
Details Conductor version: 3.7.2 Persistence implementation: Postgres Queue implementation: Postgres Lock: Redis
To Reproduce This happens from time-to-time, we did not find steps to reproduce
Expected behavior HTTP task must be executed only once.
The original issue was opened condcutor-community https://github.com/Netflix/conductor-community/issues/70 But nobody responded in months
Hi @astelmashenko , WorkflowRepairs checks for the taskId before pushing anything into the queue. Are you using locks in your configuration? There is a high chance that workflow execution is not guarded by locks so the task may be picked up by two different threads.
@manan164 , Yes we are using lock (Redis). What I have in mind is upgrade of conductor. E.g. we fixed something in our custom task and re-deploying conductor with thousands of workflows. How does it stop, e.g. stop decider firtst, wait for complete of all running tasks, stop connections and shutdown conductor. The question: Is the process of shutdown deterministic, is there evidence that it shutdowns gracefully?