[BUG] no warning for workerless step
🐛 Bug Report
Describe the bug merlin not reporting when a step isn't assigned to a worker when running in distributed mode.
To Reproduce Steps to reproduce the behavior: Don't assign a step to a worker in distributed mode and check the study directory.
Expected behavior Ideally it would show that one of the steps isn't assigned to a worker. Currently it does not show any errors/warning and it just doesn't run the step.
Screenshots
resources:
workers:
merge_posthoc_workers:
args: -l INFO --concurrency 36 --prefetch-multiplier 1 -Ofair
steps: [merge_posthoc]
batch:
type: slurm
study:
- name: setup
description:
run:
cmd:
- name: merge_posthoc
description: Combines the outputs of the previous step
run:
cmd: |
depends: [setup]
EDIT: this is only the case when submitting the spec as a slurm job.
Thank u @ymubarka
I hate to break this to you, but I am unable to reproduce this bug. I was able to run this locally and distributed (after adding a description section).
This is what I ran:
description:
name: hi
description: hi
merlin:
resources:
workers:
merge_posthoc_workers:
args: -l INFO --concurrency 36 --prefetch-multiplier 1 -Ofair
steps: [merge_posthoc]
study:
- name: setup
description:
run:
cmd: echo "hi"
- name: merge_posthoc
description: Combines the outputs of the previous step
run:
cmd: echo "hi2"
depends: [setup]
I agree that this should work, the steps and the worker will use the default queue, merlin. @ben-bay try the case where setup has task_queue: q1 and merge_posthoc has task_queue:q2. This will probably not warn you that you have no workers for q1.
@koning looks like you're right, this is a problem and is more reproducible.