htslib icon indicating copy to clipboard operation
htslib copied to clipboard

Configure HTTPS/libcurl by default because it's now needed to access CRAM reference registry

Open mlin opened this issue 8 years ago • 5 comments

Per @tk2, EBI recently began switching all www connections, including to the CRAM Reference Registry, to HTTPS. This is good even though the references are public data, because it helps to ensure the data can't be corrupted in-flight.

However, it means that htslib now requires HTTPS support (i.e. libcurl, ./configure --enable-libcurl) in order to access the CRAM Reference Registry. Otherwise (by default!), viewing a CRAM file fails with a cryptic error "Failed to populate reference for id ..."

My suggestion therefore would be to change libcurl from opt-in to opt-out.

mlin avatar Oct 13 '17 18:10 mlin

Thanks @mlin ! Here is what I did to install samtools properly and resolve the reference sequence issue with CRAM.

git clone https://github.com/samtools/samtools.git
cd samtools

wget https://github.com/samtools/htslib/releases/download/1.21/htslib-1.21.tar.bz2
tar jxvf htslib-1.21.tar.bz2
cd htslib-1.21
./configure --enable-libcurl --prefix=$PWD
make
cd ..

./configure --enable-plugins --enable-libcurl --with-htslib=$PWD/htslib-1.21
make all all-htslib

lcscs12345 avatar Sep 26 '24 23:09 lcscs12345

I was about to close this as I thought we'd done it, but apparently not.

IMO yes libcurl ought to be opt-out instead of opt-in as it restricts a lot of things and people generally don't bother to read configure output (it's very spammy) unless it fails. I'd argue a similar thing should be done with libdeflate too. Users need to be aware that they're building an implementation that may be 2-3x slower than optimal, so again the choice to do so should be a conscious one rather than accidental.

jkbonfield avatar Sep 30 '24 08:09 jkbonfield

Thanks @mlin ! Here is what I did to install samtools properly and resolve the reference sequence issue with CRAM.

I'm not really sure this resolves reference sequence issues. Yes it'll allow CRAM to download from the EBI instead, but that causes other problems (for the EBI, and also for you as it can be very slow and the service is sometimes unavailable due to load). By far the better solution here is to set up your own local reference cache for large-scale usage or to manually specify the reference files for adhoc usage.

jkbonfield avatar Sep 30 '24 08:09 jkbonfield

By far the better solution here is to set up your own local reference cache for large-scale usage or to manually specify the reference files for adhoc usage.

@jkbonfield Would be great if you can point me to the documentation about this. Thank you!

lcscs12345 avatar Sep 30 '24 09:09 lcscs12345

The EBI's official documentation on this is here: https://ena-docs.readthedocs.io/en/latest/retrieval/programmatic-access/cram-reference-cache.html

jkbonfield avatar Sep 30 '24 10:09 jkbonfield