SmartSim icon indicating copy to clipboard operation
SmartSim copied to clipboard

Support batch jobs with dragon

Open al-rigazzi opened this issue 1 year ago • 0 comments

Description

Add DragonBatchSettings that allow the DragonLauncher to launch batch jobs on Slurm and PBS.

Justification

The DragonLauncher should be able to launch batch jobs (i.e. scheduled jobs) as opposed to just running jobs in an interactive allocation. This means that a new class, DragonBatchSettings is needed, so that users can specify batch settings to a job that needs to be submitted by the available scheduler (Slurm or PBS).

Implementation Strategy

It is foreseeable that the DragonBatchSettings will need to determine what scheduler is available and, if possible, use SbatchSettings and QsubBatchSettings to write the script which needs to be submitted. Within the batch script, a Dragon server will need to be instantiated, and jobs will need to be started through it. Thus the script will not contain srun, aprun, or similar run commands, but most likely the execution of a Python script which will generate the request and submit it to the server.

Acceptance Criteria

  • [ ] Write design doc and submit it to team
  • [ ] make test-full passes

al-rigazzi avatar Mar 23 '24 14:03 al-rigazzi