Backfill sessions are not sequential after unpause
When you pause/un-pause a workflow the backfill behavior switches to parallel backfill rather than the expected sequential backfill. For example, a workflow with a one minute interval schedule that is paused for 10 minutes will run 10 parallel sessions as soon as it is un-paused. In contrast, a backfill command with the count set to 10 will run those 10 backfill sessions sequentially. I believe the expected behavior is to run things sequentially by giving the un-pause/backfill operation a chance to properly cycle through all the session times that were missed. This seems to be a bug. The impact is that multiple jobs running in parallel fail due to resource dependencies.
Digdag server version: 0.9.31 Database: PostgreSQL Log/archive storage: S3
Hello, @jaymed
IIUC, Digdag execute backfilled tasks in parallel when digdag server un-pause. It's not a bug.
And also Digdag backfill command does not guarantee sequential. The current implementation just backfills it one by one.
If your task depends on another task. you may use require> operator like the below.
For example
timezone: UTC
schedule:
monthly>: 1,09:00:00
+depend_on_all_daily_workflow_in_month:
loop>: ${moment(last_session_time).daysInMonth()}
_do:
require>: daily_workflow
session_time: ${moment(last_session_time).add(i, 'day')}
Thank you for your response. Unfortunately it does not address my issue. I am really just looking for a simple way to avoid two sessions (same workflow) from colliding. There are options to skip a session if it runs over time and collides with the following session but there are no options to wait. This would be great to have since it would allow workflows to catch up after some time.
Hello, @jaymed
Could you tell us an example workflow? I can't imagine your problem.
If your task depends on another task it may useful require> operator.
If you skip backfill, skip_delayed_by may help.
Please consider the following timeline diagram of a sample workflow. This sample workflow does not depend on any other workflow. Each workflow session is required to run to completion before the next session starts. In this diagram the workflow is paused after session 2 and unpaused sometime after session 7 would have been scheduled. What I would like to see is for session 3 - 7 to be backfilled sequentially. The problem I see now is that these session (3 - 7) are being backfilled in parallel when I unpause the workflow. We cannot have multiple sessions belonging to the same workflow running in parallel and we want to backfill all missed sessions (3 - 7) one by one without skipping any. Session 8 would also need to wait until 3 -7 have finished. How can this be accomplished?

Hello, @jaymed
Maybe you need depends_on_past which implemented in Apache Airflow, don't you?
If session 2 run until session 3 start time, you need to wait for start session 3 until session 2 complete, don't you?
It's similar to #615 issue.
@hiroyuki-sato Yes, it is exactly like depends_on_past. Airflow is great and it has many features not yet available in digdag. One reason why we are not using airflow is because it does not offer the simplicity of digdag. The streamlined approach offered by digdag is why we chose it over Airflow. With that said, would it be possible to implement such a feature in digdag? The similar issue #615 you referenced is still Open.
I would like to propose introducing wait_until_last_schedule option in schedule as follows.
https://github.com/treasure-data/digdag/compare/master...yoyama:feature-wait_until_last_schedule
If this option is true and there is active attempt, schedule will be delayed. As result, only one session will run. I am still testing it, it looks like this patch works well.
Is it available to resolve this issue?
I would like to propose introducing wait_until_last_schedule option in schedule as follows.
This is great! Could the new option be named wait_on_overtime? It would match up with the existing skip_on_overtime schedule option.
Thank you for your proposal on the name of option. I don't know which is better. As you mention, wait_on_overtime match with skip_on_overtime. But wait_until_last_schedule may be easy to understand. I hope another persons comments.