ZeroDivisionError with small signature sets (<100)

Open terynin opened this issue 4 months ago • 0 comments

I am implementing TGCN in python and need to run GSEA on very small sets (10-100) and running into the error below. I have attached the signature and gene sets used to replicate this error:

repro_signature.csv repro_geneset.json

Traceback (most recent call last):
  File ".../multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File ".../blitzgsea/__init__.py", line 39, in estimate_anchor_star
    return estimate_anchor(*args)
           ^^^^^^^^^^^^^^^^^^^^^^
  File ".../blitzgsea/__init__.py", line 42, in estimate_anchor
    es = np.array(get_peak_size_adv(abs_signature, set_size, permutations, int(seed)))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../blitzgsea/__init__.py", line 153, in get_peak_size_adv
    es_val = enrichment_score_null(abs_signature, hit_indicator.copy(), number_hits)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../blitzgsea/__init__.py", line 111, in enrichment_score_null
    norm_no_hit = 1.0 / number_miss
                  ~~~~^~~~~~~~~~~~~
ZeroDivisionError: float division by zero

This error occurs in enrichment_score_null when len(abs_signature) = number_hits causing number_miss = 0. I believe this only happens with an anchor_set_size = len(abs_signature). I fixed this problem by changing the anchor_set_sizes generation on line 175 from: anchor_set_sizes = [size for size in anchor_set_sizes if size <= abs_signature_length] to: anchor_set_sizes = [size for size in anchor_set_sizes if size < abs_signature_length]

I am wondering if this will mess up the calculation of pvalues at all.

Oct 17 '25 17:10 terynin