ilya-da
ilya-da
at this moment deault slurm 30 sec KillWait used, but this may be insufficient for large parallel jobs to gracefully terminate. to remind it define period in second between try...
Under some circumstances slurm epilog fail to cleanup processes because of parsing of nvidia-smi pmon From /var/log/slurm/prolog-epilog + for i in $(nvidia-smi pmon -c 1 | tail -n+3 | awk...
possible solution for #1317 i've opened Update the KillWait parameter in slurm.conf from 30 to 120 seconds to allow for more graceful job termination. This change ensures that jobs have...
Simple resolution for issue #1315 i've opened earlier Remove redundant 'tail' command in GPU process cleanup checks to ensure more accurate detection and termination of residual GPU processes. This change...