Skip to last record using tabix
I have many large (indexed) vcfs of the form ${CHROM}_${CHUNK}vcf.gz and was looking for a quick way to get the coordinates spanned by the file. I know that given a region, the index can be used to skip to chunks overlapping that region, but is the reverse possible? Can I use the last entry in the index to get the offset to the last chunk?
was looking for a quick way to get the coordinates spanned by the file.
Do you mean something like:
chr1 61772 17129271
chr2 262 6221917
Can I use the last entry in the index to get the offset to the last chunk?
This is a different request. Do you actually need the file offset? It wouldn't make much sense to have it displayed by tabix, but it could be returned by a HTSlib method.
In my use case I know that the file doesn't span multiple chromosomes, but yes, that's the idea. My (admittedly poor) understanding of the tabix format (for bcf/vcf files) is that it stores the (genomic) coordinate of the first record in each chunk.
This is a different request. Do you actually need the file offset? It wouldn't make much sense to have it displayed by tabix, but it could be returned by a HTSlib method.
I agree that having tabix export the file offset of the last chunk would be a weird piece of functionality, and I was thinking it would make more sense as an HTSlib method. Now that you mention it though I feel like a tabix view or tabix export that spit out a contents of the index file as like a json file (or something) could be useful in a lot of settings.