submitit
submitit copied to clipboard
Python 3.8+ toolbox for submitting jobs to Slurm
Hello! I am using slurm on 4-gpu servers using Submitit and I keep getting this unexpected srun error: ``` submitit WARNING (2022-08-12 02:06:11,453) - Caught signal SIGUSR2 on SERVER_NAME: this...
Hi all ! In the context of cluster computing, it is sometime necessary to have the jobs running with a local python environment and not the one from the central...
I've successfully used `submitit` to submit jobs to our SLURM cluster, and overall the library works great. However I'm often faced with a situation where I need to work locally...
Hi, I am trying to train on an AWS EC2 G5 node with eight A10G GPUs. I am running into a CUDA out-of-memory issue with an error message `Tried to...
Hi, I'm trying to submit a huge job array to multiple partitions such as dev1, dev2.. and so on for example, ``` executor = submitit.AutoExecutor(folder='log') executor.update_parameters(slurm_partition="dev1,dev2,dev3", slurm_array_parallelism=50000) jobs = []...
Hi :) Thanks for creating this awesome open source repo, it helps me a lot! I wrote a function that tracks jobs` status with a progress bar. Perhaps it will...
Hi, Is it possible to declare the nodeList somehow (especially on slurm)?
Noticed this while trying to make my own plugin/Executor... `rstrip` can remove too much!