xqtl-protocol icon indicating copy to clipboard operation
xqtl-protocol copied to clipboard

Improvement for running per gene/traits analysis

Open hsun3163 opened this issue 3 years ago • 1 comments

It is rather common that, for tasks with a lot of subtask(10000+) , sos process will be interrupted due to all sorts of reasons on the cluster, namely kicked out of nodes, not enough mems .etc.

When resuming the jobs with -s build it will always take quite a long time for sos to loop through the existing file to ignore the corresponding subtask. I wonder if is there a way to optimize this behavior, such that we start from the second next directly. If we specify step_2 whose input is the output of step_1, then the sos will consider the input to be empty.

hsun3163 avatar Oct 30 '22 19:10 hsun3163

As shown is the following log, it take 70GB to scan through 13000 tasks

==============================================================
qname        csg2.q
hostname     node85
group        hs3163
owner        hs3163
project      NONE
department   defaultdepartment
jobname      susie_per_gene
jobnumber    3077967
taskid       undefined
account      sge
priority     0
qsub_time    Sun Oct 30 17:13:13 2022
start_time   Sun Oct 30 17:13:27 2022
end_time     Sun Oct 30 19:04:41 2022
granted_pe   NONE
slots        1
failed       37  : qmaster enforced h_rt, h_cpu, or h_vmem limit
exit_status  137                  (Killed)
ru_wallclock 6674s
ru_utime     0.052s
ru_stime     0.034s
ru_maxrss    6.656KB
ru_ixrss     0.000B
ru_ismrss    0.000B
ru_idrss     0.000B
ru_isrss     0.000B
ru_minflt    4128
ru_majflt    0
ru_nswap     0
ru_inblock   32
ru_oublock   8
ru_msgsnd    0
ru_msgrcv    0
ru_nsignals  0
ru_nvcsw     1548
ru_nivcsw    3
cpu          9934.060s
mem          170.661TBs
io           209.761GB
iow          0.000s
maxvmem      70.000GB
arid         undefined
ar_sub_time  undefined
category     -U statg-users -u hs3163 -q csg2.q,high_mem.q -l h_rt=72000,h_vmem=70G,mem_pri=TRUE,temp_pri=TRUE

hsun3163 avatar Oct 31 '22 00:10 hsun3163