cooltools icon indicating copy to clipboard operation
cooltools copied to clipboard

Cooltools pileup slow with mixed chromosomes

Open bskubi opened this issue 7 months ago • 5 comments

When runnning cooltools pileup using 1000 random regions from a single chromosome (chr1 or chr2), it extracts snips in about 3 minutes. However, if I take the first 500 random regions from the chr1 and chr2 samples, combine them, and then run the same cooltools pileup command, it runs for over 30 minutes (10x longer) without completing.

Here's an example of the command I'm using. The only difference is the regions I insert, which may be 1000 regions from chr1 exclusively (fast), 1000 regions from chr2 exclusively (fast), or 500 regions each from chr1 and chr2 (very slow).

srun cooltools pileup --out-format HDF5 --flank 6000 --features-format bed --out snips_1_samp.hdf5 --store-snips --nproc 12 --ignore-diags 0 --clr-weight-name RU inter.mcool::/resolutions/200 chr1.tsv

I would have expected the runtime to be a maximum of ~2 times slower when two chromosomes are combined, not 10x+ times slower. For now, I'll just run each chromosome separately and combine after the fact (not a big deal), but it might be worth addressing -- I almost didn't use cooltools pileup at first because I was getting an impression of it being so slow until I figured out this was the underlying issue.

Note -- the regions I'm using are each 1bp in size for this example.

bskubi avatar Jun 30 '25 21:06 bskubi

This is very strange. Perhaps the issue comes from using too many cores... Since there is no point using more cores than chromosomes, maybe multiprocessing gets stuck somewhere. Can you try using 1 or 2 cores?

Phlya avatar Jun 30 '25 21:06 Phlya

Yes, I can try that, thanks for the tip. Also, I'm running on SLURM. I let the chr1_chr2 condition run for > 30 minutes, which has a status of RUNNING throughout that time. I just saw that when I manually cancel the process, SLURM doesn't give a final status of CANCELLED as expected, but a status of OUT_OF_MEMORY. Is there some way cooltools may somehow hang if it runs out of memory rather than shutting down?

bskubi avatar Jun 30 '25 22:06 bskubi

Ah, that is complicated, indeed processing two chromosomes in parallel will require more memory, and strange things can happen... I am not sure if it's guaranteed to fail or might be stuck, again maybe because of some strange interaction with multiprocessing or something else. If that's the case, you can just use a single core, then it will be processing one chromosome at a time and shouldn't run out of memory.

Phlya avatar Jun 30 '25 22:06 Phlya

OK, thank you for the tips!

bskubi avatar Jun 30 '25 22:06 bskubi

Did it help?

Phlya avatar Jul 06 '25 08:07 Phlya