ZeroDivisionError with small signature sets (<100)
I am implementing TGCN in python and need to run GSEA on very small sets (10-100) and running into the error below. I have attached the signature and gene sets used to replicate this error:
repro_signature.csv repro_geneset.json
Traceback (most recent call last):
File ".../multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
^^^^^^^^^^^^^^^^^^^
File ".../blitzgsea/__init__.py", line 39, in estimate_anchor_star
return estimate_anchor(*args)
^^^^^^^^^^^^^^^^^^^^^^
File ".../blitzgsea/__init__.py", line 42, in estimate_anchor
es = np.array(get_peak_size_adv(abs_signature, set_size, permutations, int(seed)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../blitzgsea/__init__.py", line 153, in get_peak_size_adv
es_val = enrichment_score_null(abs_signature, hit_indicator.copy(), number_hits)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../blitzgsea/__init__.py", line 111, in enrichment_score_null
norm_no_hit = 1.0 / number_miss
~~~~^~~~~~~~~~~~~
ZeroDivisionError: float division by zero
This error occurs in enrichment_score_null when len(abs_signature) = number_hits causing number_miss = 0. I believe this only happens with an anchor_set_size = len(abs_signature). I fixed this problem by changing the anchor_set_sizes generation on line 175
from: anchor_set_sizes = [size for size in anchor_set_sizes if size <= abs_signature_length]
to: anchor_set_sizes = [size for size in anchor_set_sizes if size < abs_signature_length]
I am wondering if this will mess up the calculation of pvalues at all.