samtools samtools 1.15 become very slow when I set maxcnt to 8,000,000

I want to use samtools to handle extremely high depth data. So I changed iter->maxcnt = 8000; in sam.c to 8000000. Then it becomes very slow that it becomes not unusable. I changed the same place in samtools 1.12 and it works like a charm. Upon doing some debugging and compare the differences between 1.12 and 1.15, I noticed that ks_introsort(node, tv->n_nodes, tv->aux); in bam_lpileup.c of 1.12 is replaced by splaysort(node, tv->n_nodes, tv->aux); in 1.15. After rename splaysort with ks_introsort, 1.15 works as expected.

So my conclusion is that splaysort doesn't work when maxcnt is large. Is this a bug or is it expected? Why was ks_introsort replaced by splaysort in 1.15?

May 06 '22 06:05 ymcki

As NEWS file for 1.15 says:

Switch to using splaysort in bam_lpileup. Improves speed and efficiency in "tview".

I don't think anyone tried it on extremely high depth data.

May 06 '22 13:05 whitwham

I'm struggling to reproduce the claimed speed increase in tview. https://github.com/samtools/samtools/pull/1548 shows the data it was tested on.

When I did the testing at the time, I didn't realise that tview had a maximum depth limit given that it has no option to change this depth. So it was high depth, but capped at 8000 so not really high. The test was vertically scrolling in a large pileup.

Repeating that test with the default 8000 I see:

@ seq4c[samtools.../samtools]130; perl -e 'print "K" x 50, "q"' | time ./samtools.introsort tview ~/scratch/data/34005_1#18.fixmate.bam -p MN908947.3:500
27.72user 0.19system 0:27.93elapsed 99%CPU (0avgtext+0avgdata 43388maxresident)k
0inputs+0outputs (0major+10110minor)pagefaults 0swaps

@ seq4c[samtools.../samtools]; perl -e 'print "K" x 50, "q"' | time ./samtools.splaysort tview ~/scratch/data/34005_1#18.fixmate.bam -p MN908947.3:500
25.08user 0.25system 0:25.34elapsed 99%CPU (0avgtext+0avgdata 43828maxresident)k
0inputs+0outputs (0major+10183minor)pagefaults 0swaps

So that's 27.7s to 25.1s going from introsort to splaysort.

With a bam_plp_set_maxcnt(buf->iter, 20000); call added, those figures are 50.9s to 41.2s. At 100000 maxcnt it becomes very slow with a woeful 415s for introsort and 398s for splaysort. In all cases here splaysort is slightly faster than introsort.

With a single test rather than scrolling, doing just samtools tview -p MN908947.3:900 -d T /nfs/users/nfs_j/jkb/scratch/data/34005_1#18.fixmate.bam|wc I see introsort at 13.6s and splaysort at 15.9s. So that's now the reverse with introsort slightly faster than splaysort.

However in both cases performance is, frankly, pretty tragic. I certainly don't see any enormous changes here. What sort of speed differences were you seeing?

Clearly with this test, vertical scrolling, it ought to be almost instant. Every vertical scroll causes a complete reload of the entire data. I don't understand why having laid out the records it doesn't just redisplay one line lower down! I guess it's discarding the pileup data. It's probably worth figuring that out too as it'd have a much larger usability improvement for deep data. However it's a different issue and unrelated to sorting functions.

Jun 20 '22 16:06 jkbonfield