xqtl-protocol icon indicating copy to clipboard operation
xqtl-protocol copied to clipboard

Actual mem usage of TensorQTL

Open hsun3163 opened this issue 3 years ago • 2 comments

  1. When requested mem = 50G, the run fails saying not enough mem
==============================================================
qname        csg2.q
hostname     node88
group        hs3163
owner        hs3163
project      NONE
department   defaultdepartment
jobname      job_t49524335528fbed8
jobnumber    2979310
taskid       undefined
account      sge
priority     0
qsub_time    Tue Oct 18 16:49:04 2022
start_time   Tue Oct 18 16:49:10 2022
end_time     Tue Oct 18 16:55:55 2022
granted_pe   openmp
slots        8
failed       0
exit_status  0
ru_wallclock 405s
ru_utime     6827.551s
ru_stime     2150.897s
ru_maxrss    4.972MB
ru_ixrss     0.000B
ru_ismrss    0.000B
ru_idrss     0.000B
ru_isrss     0.000B
ru_minflt    278753626
ru_majflt    2100
ru_nswap     0
ru_inblock   204112
ru_oublock   839560
ru_msgsnd    0
ru_msgrcv    0
ru_nsignals  0
ru_nvcsw     861168
ru_nivcsw    5466124
cpu          8978.448s
mem          32.263TBs
io           1.020GB
iow          0.000s
maxvmem      4.998GB
arid         undefined
ar_sub_time  undefined
category     -U statg-users -u hs3163 -q csg2.q -l h_rt=43200,h_vmem=50G,temp_pri=TRUE -pe openmp 8
  1. When requested mem = 80G, the run succeed
==============================================================
qname        csg2.q
hostname     node48
group        hs3163
owner        hs3163
project      NONE
department   defaultdepartment
jobname      job_t142e53bc69b04921
jobnumber    2979665
taskid       undefined
account      sge
priority     0
qsub_time    Tue Oct 18 17:16:50 2022
start_time   Tue Oct 18 17:16:59 2022
end_time     Tue Oct 18 17:42:48 2022
granted_pe   openmp
slots        8
failed       0
exit_status  0
ru_wallclock 1549s
ru_utime     26118.842s
ru_stime     11915.166s
ru_maxrss    7.781MB
ru_ixrss     0.000B
ru_ismrss    0.000B
ru_idrss     0.000B
ru_isrss     0.000B
ru_minflt    1161553580
ru_majflt    1485
ru_nswap     0
ru_inblock   204128
ru_oublock   2243944
ru_msgsnd    0
ru_msgrcv    0
ru_nsignals  0
ru_nvcsw     2778990
ru_nivcsw    7072254
cpu          38034.009s
mem          207.808TBs
io           2.515GB
iow          0.000s
maxvmem      7.814GB
arid         undefined
ar_sub_time  undefined
category     -U statg-users -u hs3163 -q csg2.q -l h_rt=43200,h_vmem=80G,temp_pri=TRUE -pe openmp 8

However, as shown above, the number of mem used is less than 8G

hsun3163 avatar Oct 18 '22 23:10 hsun3163

The mem error are as followed

hs3163@node39:/mnt/vast/hpc/csg/xqtl_workflow_testing/finalizing$ cat /mnt/vast/hpc/csg/xqtl_workflow_testing/finalizing/output10/association_scan/protocol-eqtl/TensorQTL1/xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.bed.per_chrom_xqtl_protocol_data.rnaseqc.low_expression_filtered.outlier_removed.tmm.expression.cov_pca.resid.PEER.cov.22.cis_qtl_pairs.22.parquet.stderr
Mapping files: 100%|██████████| 3/3 [00:03<00:00,  1.12s/it]
Traceback (most recent call last):
  File "/mnt/vast/hpc/csg/xqtl_workflow_testing/finalizing/tmp_akn7w9g/singularity_run_4930.py", line 54, in <module>
    lambda_col = pairs_df.groupby("molecular_trait_object_id").apply( lambda x:  chi2.ppf(1. - np.median(x.pval_nominal), 1)/chi2.ppf(0.5,1))
  File "/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 894, in apply
    result = self._python_apply_general(f, self._selected_obj)
  File "/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 928, in _python_apply_general
    keys, values, mutated = self.grouper.apply(f, data, self.axis)
  File "/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 185, in apply
    splitter = self._get_splitter(data, axis=axis)
  File "/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 160, in _get_splitter
    comp_ids, _, ngroups = self.group_info
  File "pandas/_libs/properties.pyx", line 33, in pandas._libs.properties.CachedProperty.__get__
  File "/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 300, in group_info
    comp_ids, obs_group_ids = self._get_compressed_codes()
  File "/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 318, in _get_compressed_codes
    all_codes = self.codes
  File "/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 259, in codes
    return [ping.codes for ping in self.groupings]
  File "/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/ops.py", line 259, in <listcomp>
    return [ping.codes for ping in self.groupings]
  File "/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/grouper.py", line 591, in codes
    self._make_codes()
  File "/opt/conda/lib/python3.8/site-packages/pandas/core/groupby/grouper.py", line 623, in _make_codes
    codes, uniques = algorithms.factorize(
  File "/opt/conda/lib/python3.8/site-packages/pandas/core/algorithms.py", line 722, in factorize
    codes, uniques = factorize_array(
  File "/opt/conda/lib/python3.8/site-packages/pandas/core/algorithms.py", line 528, in factorize_array
    uniques, codes = table.factorize(
  File "pandas/_libs/hashtable_class_helper.pxi", line 4509, in pandas._libs.hashtable.StringHashTable.factorize
  File "pandas/_libs/hashtable_class_helper.pxi", line 4399, in pandas._libs.hashtable.StringHashTable._unique
numpy.core._exceptions.MemoryError: Unable to allocate 111. MiB for an array with shape (14609477,) and data type int64

hsun3163 avatar Oct 18 '22 23:10 hsun3163

  1. Based on the discussion with system admin Jose , it turns out that there is a 55G mem usage spike that qacct and monitor.py failed to capture
  2. The reason for the unreasonable high mem usage for this two chr is due to a earlier bug where none of the snps are filtered out. This bug is already fix.

hsun3163 avatar Oct 20 '22 01:10 hsun3163