merlin icon indicating copy to clipboard operation
merlin copied to clipboard

[BUG] no warning for workerless step

Open ymubarka opened this issue 6 years ago • 5 comments

🐛 Bug Report

Describe the bug merlin not reporting when a step isn't assigned to a worker when running in distributed mode.

To Reproduce Steps to reproduce the behavior: Don't assign a step to a worker in distributed mode and check the study directory.

Expected behavior Ideally it would show that one of the steps isn't assigned to a worker. Currently it does not show any errors/warning and it just doesn't run the step.

Screenshots

    resources:
        workers:
            merge_posthoc_workers:
                args: -l INFO --concurrency 36 --prefetch-multiplier 1 -Ofair
                steps: [merge_posthoc]
                batch:
                  type: slurm


study:
    - name: setup
      description: 
      run:
          cmd: 

    - name: merge_posthoc
      description: Combines the outputs of the previous step
      run:
          cmd: |
          depends: [setup]


EDIT: this is only the case when submitting the spec as a slurm job.

ymubarka avatar Apr 28 '20 20:04 ymubarka

Thank u @ymubarka

ben-bay avatar Apr 28 '20 20:04 ben-bay

I hate to break this to you, but I am unable to reproduce this bug. I was able to run this locally and distributed (after adding a description section).

ben-bay avatar Apr 28 '20 21:04 ben-bay

This is what I ran:

description:
    name: hi
    description: hi

merlin:
    resources:
        workers:
            merge_posthoc_workers:
                args: -l INFO --concurrency 36 --prefetch-multiplier 1 -Ofair
                steps: [merge_posthoc]


study:
    - name: setup
      description:
      run:
          cmd: echo "hi"

    - name: merge_posthoc
      description: Combines the outputs of the previous step
      run:
          cmd: echo "hi2"
          depends: [setup]

ben-bay avatar Apr 28 '20 21:04 ben-bay

I agree that this should work, the steps and the worker will use the default queue, merlin. @ben-bay try the case where setup has task_queue: q1 and merge_posthoc has task_queue:q2. This will probably not warn you that you have no workers for q1.

koning avatar Apr 28 '20 22:04 koning

@koning looks like you're right, this is a problem and is more reproducible.

ben-bay avatar Apr 28 '20 22:04 ben-bay