After workflow repaired task is executed two times
Describe the bug We notices that task is executed twice sometimes. After we enabled debug logs we found out that after WorkflowRepairService re-queued task for some reason the task was exeucted two times:
INFO 2022-07-04T07:56:38,583 147034 com.netflix.conductor.core.reconciliation.WorkflowRepairService [sweeper-thread-1] Task 425d9c94-dc30-441b-b21b-73ccc5118829 in workflow d6e20f06-c884-4c25-81a4-4a7c0eb3827e re-queued for repairs
DEBUG 2022-07-04T07:56:42,994 151445 com.netflix.conductor.contribs.tasks.http.HttpTask [system-task-worker-1] Response: 200, {bills={partyAUTHOR={biId=5200737, status=OPEN}, partyUNIVERSITY={biId=5200740, status=OPEN}}}, task:425d9c94-dc30-441b-b21b-73ccc5118829
DEBUG 2022-07-04T07:56:42,994 151445 com.netflix.conductor.contribs.tasks.http.HttpTask [system-task-worker-0] Response: 200, {bills={partyAUTHOR={biId=5200738, status=OPEN}, partyUNIVERSITY={biId=5200739, status=OPEN}}}, task:425d9c94-dc30-441b-b21b-73ccc5118829
What does WorkflowRepairService do and do we need it at all? Why does it happen even when we have lock service? Thanks.
Details Conductor version: 3.7.2 Persistence implementation: Postgres Queue implementation: Postgres Lock: Redis
To Reproduce This happens from time-to-time, we did not find steps to reproduce
Expected behavior HTTP task must be executed only once.
I have observed the same issue. I have tried adding redis-lock and disabled repair service. It happens when a workflow is in scheduled for too long because of unavailable workers.
To reproduce, you can run a load test such that it makes your instance slow to pick up the workflows.
Hello - are there any updates for this issue? We are running into this error with system under load using version 3.7.3.70.