bcftools icon indicating copy to clipboard operation
bcftools copied to clipboard

bcftools norm -m+ dosn't expand the FORMAT/GL field

Open Nheyer opened this issue 5 years ago • 5 comments

When using bcftools norm to combine rows, when the genotypes are expanded the FORMAT/GT field is updated but the GL field is unchanged causing a malformed vcf file 1.6.2(GL) this causes any down stream analysis ( or even attempting to un norm the vcf) to fail, this issue will likely also effect the reserved tags PL,PP,GP

work arround for now is to simply remove the offending format tags with bcftools annotate -x FORMAT/GL

Nheyer avatar Jun 16 '20 23:06 Nheyer

How is the field defined in the header? If it is Number=G, as it should be, then it should work. If not, please provide a small test case to reproduce the problem

pd3 avatar Jun 17 '20 14:06 pd3

it is G, the offending section is vcfnorm.c line 1590, I also included an example of what will break it.

steps to reproduse:

  1. run bcftools norm -m+any minimal.vcf -o test.vcf
  2. try to trim alt alliles with bcftools view -a test.vcf examples.zip

Nheyer avatar Jun 22 '20 19:06 Nheyer

@pd3

Nheyer avatar Jul 03 '20 23:07 Nheyer

Thank you for the test case. Unfortunately, the program does not handle general de/atomization situations like this yet, it can only perform an inverse of simplified split operations, e.g. two 0/1 and 1/0 records joined into 1/2, but not three records with ./. genotypes. There is a plan to generalize this and some preliminary work has been done in this regard, but it is not a simple task and is not on top of the priorities at the moment. I'll mark this as a feature request.

pd3 avatar Jul 06 '20 07:07 pd3

Ok I will work on it then !

Nheyer avatar Jul 06 '20 16:07 Nheyer