bigscience icon indicating copy to clipboard operation
bigscience copied to clipboard

can you share the slurm.conf you are using?

Open OhadRubin opened this issue 3 years ago • 3 comments

Hey, pinging @stas00 I'm a researcher from Tel-Aviv University and were thinking about implementing QOS, similar to what you have with the Jean Zay cluster. It would be really helpful to see the slurm.conf you are using for your QOS setting. Thanks! Ohad

OhadRubin avatar Mar 27 '22 06:03 OhadRubin

Hi @OhadRubin,

The QoS settings are not defined in slurm.conf. What would you like to know exactly?

Rémi IDRIS User Support Team

RemiLacroix-IDRIS avatar Mar 28 '22 09:03 RemiLacroix-IDRIS

I would like to reproduce the QOS settings you have here:

--qos=qos_gpu-t3 20h / 512gpus (default priority)
--qos=qos_gpu-t4 100h / 16gpus - long runnning slow jobs - e.g. preprocessing
--qos=qos_gpu-dev 2h / 32gpus - this is for getting allocation much faster - for dev work!

(with slightly smaller numbers haha)

OhadRubin avatar Mar 29 '22 07:03 OhadRubin

The output of sacctmgr show qos -P:

Name|Priority|GraceTime|Preempt|PreemptExemptTime|PreemptMode|Flags|UsageThres|UsageFactor|GrpTRES|GrpTRESMins|GrpTRESRunMins|GrpJobs|GrpSubmit|GrpWall|MaxTRES|MaxTRESPerNode|MaxTRESMins|MaxWall|MaxTRESPU|MaxJobsPU|MaxSubmitPU|MaxTRESPA|MaxJobsPA|MaxSubmitPA|MinTRES
qos_cpu-dev|80|00:00:00|||cluster|||1.000000|cpu=96000|||||||||02:00:00|cpu=10240||10|cpu=10240|||
qos_gpu-dev|80|00:00:00|||cluster|||1.000000|cpu=10240,gres/gpu=512|||||||||02:00:00|cpu=640,gres/gpu=32||10|cpu=640,gres/gpu=32|||
qos_cpu-t3|50|00:00:00|||cluster|||1.000000|||||||cpu=40960|||20:00:00|cpu=96000||10000|cpu=96000|||
qos_gpu-t3|50|00:00:00|||cluster|||1.000000|||||||cpu=10240,gres/gpu=512|||20:00:00|cpu=20480,gres/gpu=1024||10000|cpu=20480,gres/gpu=1024|||
qos_cpu-t4|40|00:00:00|||cluster|||1.000000|cpu=10240||||||cpu=320|||4-04:00:00|cpu=2560|||cpu=2560|||
qos_gpu-t4|40|00:00:00|||cluster|||1.000000|cpu=10240,gres/gpu=512||||||cpu=3600,gres/gpu=180|||4-04:00:00|cpu=3600,gres/gpu=180|||cpu=3600,gres/gpu=180|||

RemiLacroix-IDRIS avatar Apr 19 '22 08:04 RemiLacroix-IDRIS