xqtl-protocol
xqtl-protocol copied to clipboard
Improvement for running per gene/traits analysis
It is rather common that, for tasks with a lot of subtask(10000+) , sos process will be interrupted due to all sorts of reasons on the cluster, namely kicked out of nodes, not enough mems .etc.
When resuming the jobs with -s build it will always take quite a long time for sos to loop through the existing file to ignore the corresponding subtask. I wonder if is there a way to optimize this behavior, such that we start from the second next directly. If we specify step_2 whose input is the output of step_1, then the sos will consider the input to be empty.
As shown is the following log, it take 70GB to scan through 13000 tasks
==============================================================
qname csg2.q
hostname node85
group hs3163
owner hs3163
project NONE
department defaultdepartment
jobname susie_per_gene
jobnumber 3077967
taskid undefined
account sge
priority 0
qsub_time Sun Oct 30 17:13:13 2022
start_time Sun Oct 30 17:13:27 2022
end_time Sun Oct 30 19:04:41 2022
granted_pe NONE
slots 1
failed 37 : qmaster enforced h_rt, h_cpu, or h_vmem limit
exit_status 137 (Killed)
ru_wallclock 6674s
ru_utime 0.052s
ru_stime 0.034s
ru_maxrss 6.656KB
ru_ixrss 0.000B
ru_ismrss 0.000B
ru_idrss 0.000B
ru_isrss 0.000B
ru_minflt 4128
ru_majflt 0
ru_nswap 0
ru_inblock 32
ru_oublock 8
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals 0
ru_nvcsw 1548
ru_nivcsw 3
cpu 9934.060s
mem 170.661TBs
io 209.761GB
iow 0.000s
maxvmem 70.000GB
arid undefined
ar_sub_time undefined
category -U statg-users -u hs3163 -q csg2.q,high_mem.q -l h_rt=72000,h_vmem=70G,mem_pri=TRUE,temp_pri=TRUE