bcftools icon indicating copy to clipboard operation
bcftools copied to clipboard

Slice VCF/BCF files

Open hzpc-joostk opened this issue 7 years ago • 0 comments

I regularly find myself querying chunks from a variant file for educational, demo, review or testing purposes. For this I use bcftools view -Ob -r Chr.8:1-1000000 for example. This nicely jumps to the location of the (indexed) file and streams it until the end of that region. Depending on the file's size, this may take quite time, as all records in that region are parsed (and optionally converted VCF->BCF->VCF). (At most 70 MB/sec with 16 additional threads.)

It would be nice to just slice this part of the file using the index and leaving in between BGZF blocks as-is. This is also done with bcftools concat --naive for whole files, it does not support --naive and --region simultaneously.

Any way of implementing a bcftools slice command for this? The output type is expected to be the same is input; compressed VCF/BCF. Optionally copy and slice the index as well (if --output to a file).

About:   Slice indexed VCF/BCF files without recompression.
Usage:   bcftools slice [options] <in.bcf>|<in.vcf.gz> [region1 [...]]

Options:
       --no-version               Do not append version and command line to the header
   -o, --output <file>            Write output to a file [standard output]
   -r, --regions <region>         Restrict to comma-separated list of regions
   -R, --regions-file <file>      Restrict to regions listed in a file

hzpc-joostk avatar Aug 27 '18 09:08 hzpc-joostk