submitit icon indicating copy to clipboard operation
submitit copied to clipboard

Python 3.8+ toolbox for submitting jobs to Slurm

Results 79 submitit issues
Sort by recently updated
recently updated
newest added

Hello! I am using slurm on 4-gpu servers using Submitit and I keep getting this unexpected srun error: ``` submitit WARNING (2022-08-12 02:06:11,453) - Caught signal SIGUSR2 on SERVER_NAME: this...

Hi all ! In the context of cluster computing, it is sometime necessary to have the jobs running with a local python environment and not the one from the central...

enhancement

I've successfully used `submitit` to submit jobs to our SLURM cluster, and overall the library works great. However I'm often faced with a situation where I need to work locally...

Hi, I am trying to train on an AWS EC2 G5 node with eight A10G GPUs. I am running into a CUDA out-of-memory issue with an error message `Tried to...

Hi, I'm trying to submit a huge job array to multiple partitions such as dev1, dev2.. and so on for example, ``` executor = submitit.AutoExecutor(folder='log') executor.update_parameters(slurm_partition="dev1,dev2,dev3", slurm_array_parallelism=50000) jobs = []...

Hi :) Thanks for creating this awesome open source repo, it helps me a lot! I wrote a function that tracks jobs` status with a progress bar. Perhaps it will...

Hi, Is it possible to declare the nodeList somehow (especially on slurm)?

Noticed this while trying to make my own plugin/Executor... `rstrip` can remove too much!

CLA Signed