bigscience
bigscience copied to clipboard
can you share the slurm.conf you are using?
Hey, pinging @stas00 I'm a researcher from Tel-Aviv University and were thinking about implementing QOS, similar to what you have with the Jean Zay cluster. It would be really helpful to see the slurm.conf you are using for your QOS setting. Thanks! Ohad
Hi @OhadRubin,
The QoS settings are not defined in slurm.conf. What would you like to know exactly?
Rémi IDRIS User Support Team
I would like to reproduce the QOS settings you have here:
--qos=qos_gpu-t3 20h / 512gpus (default priority)
--qos=qos_gpu-t4 100h / 16gpus - long runnning slow jobs - e.g. preprocessing
--qos=qos_gpu-dev 2h / 32gpus - this is for getting allocation much faster - for dev work!
(with slightly smaller numbers haha)
The output of sacctmgr show qos -P:
Name|Priority|GraceTime|Preempt|PreemptExemptTime|PreemptMode|Flags|UsageThres|UsageFactor|GrpTRES|GrpTRESMins|GrpTRESRunMins|GrpJobs|GrpSubmit|GrpWall|MaxTRES|MaxTRESPerNode|MaxTRESMins|MaxWall|MaxTRESPU|MaxJobsPU|MaxSubmitPU|MaxTRESPA|MaxJobsPA|MaxSubmitPA|MinTRES
qos_cpu-dev|80|00:00:00|||cluster|||1.000000|cpu=96000|||||||||02:00:00|cpu=10240||10|cpu=10240|||
qos_gpu-dev|80|00:00:00|||cluster|||1.000000|cpu=10240,gres/gpu=512|||||||||02:00:00|cpu=640,gres/gpu=32||10|cpu=640,gres/gpu=32|||
qos_cpu-t3|50|00:00:00|||cluster|||1.000000|||||||cpu=40960|||20:00:00|cpu=96000||10000|cpu=96000|||
qos_gpu-t3|50|00:00:00|||cluster|||1.000000|||||||cpu=10240,gres/gpu=512|||20:00:00|cpu=20480,gres/gpu=1024||10000|cpu=20480,gres/gpu=1024|||
qos_cpu-t4|40|00:00:00|||cluster|||1.000000|cpu=10240||||||cpu=320|||4-04:00:00|cpu=2560|||cpu=2560|||
qos_gpu-t4|40|00:00:00|||cluster|||1.000000|cpu=10240,gres/gpu=512||||||cpu=3600,gres/gpu=180|||4-04:00:00|cpu=3600,gres/gpu=180|||cpu=3600,gres/gpu=180|||