bcftools icon indicating copy to clipboard operation
bcftools copied to clipboard

Does BCFTools merge -m all also merge partially overlapping variants with different starting positions?

Open WimS83 opened this issue 10 years ago • 6 comments

Hi,

Does BCFTools merge -m all also merge partially overlapping variants?

When I merge two multi-sample vcf files with bcftools and then convert the complex variants in the merged vcf file to primitives (vcfallelicprimitives) I get duplicate variant records ( identical CHROM, POS, REF, ALT).

This seems to be caused by BCFTools not merging variant records from the 2 vcf files that only partially overlap, ie the variants don't have the same starting position but the starting positions are almost next to each other and the complex alleles overlap.

To solve this problem I currently have to process the merged vcf file with vcfcreatemulti from the vcflib package. https://github.com/ekg/vcflib#vcfcreatemulti

I did expect "bcftools merge -m all" to have the functionality to merge variant records based on partially overlapping alleles, without having to have the exact same starting position for each variant record.

Can you confirm that this is or is not the case? And if not if this is something you are planning to add to bcftools?

Thank you!

WimS83 avatar Sep 29 '15 11:09 WimS83

To give an example. In my merged and to primitives converted vcf file I have the following variant records:

Chr1 9991617 . G C 513.815 Chr1 9991617 . G C 2227.48

These lead back to these two variant records in my merged vcf file that partially overlap and were not merged with BCFTools. Chr1 9991615 . GGG GGC 513.815 Chr1 9991617 . G C 2227.48 These two variant lead further back to the two vcf files that I merged, one being from each vcf file.

vcfcreatemulti merges these two records in to: Chr1 9991615 . GGG GGC,GGC 513.815

WimS83 avatar Sep 29 '15 12:09 WimS83

Hi, that is correct, currently only matching positions are considered. There is a plan (and unfinished code) to handle these cases.

pd3 avatar Sep 29 '15 12:09 pd3

Ok thank you for the information. Good to know that this is the case and that you are working on it.

WimS83 avatar Sep 29 '15 12:09 WimS83

Is anyone still actively working on this, and if not, could I get whatever code does exist, so that I can Finnish it up, as I desperately need the functionality, or at least a way to fix reference depth in partially overlapping variants?

Nheyer avatar Jul 18 '19 19:07 Nheyer

I have a C++ file that spreads the DP and RO to overlapped regions, so that using bcftools norm -m +any will fix the problem in most cases. but if someone could point me to the unfinished code mentioned I would love to finish it up @pd3

Nheyer avatar May 26 '20 18:05 Nheyer

Sorry, this got buried and never climbed anywhere near my priority list. At some point atomization and deatomization of variants is planned for bcftools norm.

pd3 avatar May 28 '20 15:05 pd3