htslib icon indicating copy to clipboard operation
htslib copied to clipboard

For consideration: replace cram rANS with htscodecs

Open jkbonfield opened this issue 6 years ago • 2 comments

A quick benchmark of current htslib vs manual replacement (just copied over) rANS_static.c and rANS_byte.h from https://github.com/jkbonfield/htscodecs shows decoding a NovaSeq CRAM v3.0 file is 20% faster. That's quite a win for such an easy change!

This is the same code that io_lib (scramble) will be using. See https://github.com/jkbonfield/io_lib/tree/codec-lib which brings in htscodecs as a git submodule. Long term this means we only have one source of complex C codec implementations, which makes them easier to maintain. It's also going to be a big win once we push CRAM v3.1 live.

jkbonfield avatar May 24 '19 13:05 jkbonfield

Might also help with https://oss-fuzz.com/testcase-detail/5720834143944704

valeriuo avatar Oct 25 '19 14:10 valeriuo

Possibly not. Most of the timeouts are just because it's possible to generate bit streams of very highly compressed data, such that decoding takes a lot of memory and a long time.

Although it's possible it found something different. I haven't investigated it yet.

jkbonfield avatar Oct 25 '19 14:10 jkbonfield