xqtl-protocol
xqtl-protocol copied to clipboard
Actual mem usage of TensorQTL
- When requested mem = 50G, the run fails saying not enough mem
==============================================================
qname csg2.q
hostname node88
group hs3163
owner hs3163
project NONE
department defaultdepartment
jobname job_t49524335528fbed8
jobnumber 2979310
taskid undefined
account sge
priority 0
qsub_time Tue Oct 18 16:49:04 2022
start_time Tue Oct 18 16:49:10 2022
end_time Tue Oct 18 16:55:55 2022
granted_pe openmp
slots 8
failed 0
exit_status 0
ru_wallclock 405s
ru_utime 6827.551s
ru_stime 2150.897s
ru_maxrss 4.972MB
ru_ixrss 0.000B
ru_ismrss 0.000B
ru_idrss 0.000B
ru_isrss 0.000B
ru_minflt 278753626
ru_majflt 2100
ru_nswap 0
ru_inblock 204112
ru_oublock 839560
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals 0
ru_nvcsw 861168
ru_nivcsw 5466124
cpu 8978.448s
mem 32.263TBs
io 1.020GB
iow 0.000s
maxvmem 4.998GB
arid undefined
ar_sub_time undefined
category -U statg-users -u hs3163 -q csg2.q -l h_rt=43200,h_vmem=50G,temp_pri=TRUE -pe openmp 8
- When requested mem = 80G, the run succeed
==============================================================
qname csg2.q
hostname node48
group hs3163
owner hs3163
project NONE
department defaultdepartment
jobname job_t142e53bc69b04921
jobnumber 2979665
taskid undefined
account sge
priority 0
qsub_time Tue Oct 18 17:16:50 2022
start_time Tue Oct 18 17:16:59 2022
end_time Tue Oct 18 17:42:48 2022
granted_pe openmp
slots 8
failed 0
exit_status 0
ru_wallclock 1549s
ru_utime 26118.842s
ru_stime 11915.166s
ru_maxrss 7.781MB
ru_ixrss 0.000B
ru_ismrss 0.000B
ru_idrss 0.000B
ru_isrss 0.000B
ru_minflt 1161553580
ru_majflt 1485
ru_nswap 0
ru_inblock 204128
ru_oublock 2243944
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals 0
ru_nvcsw 2778990
ru_nivcsw 7072254
cpu 38034.009s
mem 207.808TBs
io 2.515GB
iow 0.000s
maxvmem 7.814GB
arid undefined
ar_sub_time undefined
category -U statg-users -u hs3163 -q csg2.q -l h_rt=43200,h_vmem=80G,temp_pri=TRUE -pe openmp 8
However, as shown above, the number of mem used is less than 8G
The mem error are as followed
hs3163@node39:/mnt/vast/hpc/csg/xqtl_workflow_testing/finalizing$ cat /mnt/vast/hpc/csg/xqtl_workflow_testing/finalizing/output10/association_scan/protocol-eqtl/TensorQTL1/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.per_chrom_xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.cov_pca.resid.PEER.cov.22.cis_qtl_pairs.22.parquet.stderr
Mapping files: 100%|██████████| 3/3 [00:03<00:00, 1.12s/it]
Traceback (most recent call last):
File "/mnt/vast/hpc/csg/xqtl_workflow_testing/finalizing/tmp_akn7w9g/singularity_run_4930.py", line 54, in <module>
lambda_col = pairs_df.groupby("molecular_trait_object_id").apply( lambda x: chi2.ppf(1. - np.median(x.pval_nominal), 1)/chi2.ppf(0.5,1))
File "/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 894, in apply
result = self._python_apply_general(f, self._selected_obj)
File "/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 928, in _python_apply_general
keys, values, mutated = self.grouper.apply(f, data, self.axis)
File "/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 185, in apply
splitter = self._get_splitter(data, axis=axis)
File "/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 160, in _get_splitter
comp_ids, _, ngroups = self.group_info
File "pandas/_libs/properties.pyx", line 33, in pandas._libs.properties.CachedProperty.__get__
File "/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 300, in group_info
comp_ids, obs_group_ids = self._get_compressed_codes()
File "/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 318, in _get_compressed_codes
all_codes = self.codes
File "/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 259, in codes
return [ping.codes for ping in self.groupings]
File "/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 259, in <listcomp>
return [ping.codes for ping in self.groupings]
File "/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/grouper.py", line 591, in codes
self._make_codes()
File "/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/grouper.py", line 623, in _make_codes
codes, uniques = algorithms.factorize(
File "/opt/conda/lib/python3.8/site-packages/pandas/core/algorithms.py", line 722, in factorize
codes, uniques = factorize_array(
File "/opt/conda/lib/python3.8/site-packages/pandas/core/algorithms.py", line 528, in factorize_array
uniques, codes = table.factorize(
File "pandas/_libs/hashtable_class_helper.pxi", line 4509, in pandas._libs.hashtable.StringHashTable.factorize
File "pandas/_libs/hashtable_class_helper.pxi", line 4399, in pandas._libs.hashtable.StringHashTable._unique
numpy.core._exceptions.MemoryError: Unable to allocate 111. MiB for an array with shape (14609477,) and data type int64
- Based on the discussion with system admin Jose , it turns out that there is a 55G mem usage spike that qacct and monitor.py failed to capture
- The reason for the unreasonable high mem usage for this two chr is due to a earlier bug where none of the snps are filtered out. This bug is already fix.