vg
vg copied to clipboard
Autoindex should parse tabix-indexed monolithic VCFs in parallel
We've had a few users complain about autoindex's excessively slow chunking process for VCFs when they are provided as a single file for all chromosomes (e.g. https://github.com/vgteam/vg/issues/4274). This results from a single-threaded linear scan over the VCF to parcel it out to chunks that subsequently run in parallel. If the VCF is tabix-indexed, it should be possible to chunk the VCF in parallel across chromosomes, which would alleviate this issue.