batchtools
batchtools copied to clipboard
btlapply submitting to only a small number of nodes in Slurm
Hi,
I have a recurring problem with btlapply, whichever code I use inside. I'm not sure if it's a bug in batchtools, a problem in the way I use it or an error in my Slurm configuration.
I have 16 nodes, each with 40 CPUs. Whatever I submit using btlapply, it never uses more than 1 or 2 nodes. And even the second node is used half.
sinfo reports all the other nodes in idle state.
srun -N 16 hostname will correctly print the hostname of each node, meaning I can send jobs to all the nodes.
Any idea or suggestions on how to fix this problem or where to look at?