vt icon indicating copy to clipboard operation
vt copied to clipboard

vt estimate converts QUAL to '-nan' for multi-allelic sites

Open oleraj opened this issue 7 years ago • 0 comments

Hi,

I noticed recently that when I use vt estimate command, the QUAL score is convert to -nan for multi-allelic sites. This is fine with variants with single allele. An example variant is here:

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	S1	S2	S3
21	10906293	rs28616746	T	C,G	141379	PASS	AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum	GT:AD:DP:GQ:PL	1/2:0,133,52:185:99:5668,1373,975,4289,0,4133	1/2:0,146,63:209:99:6136,1706,1270,4431,0,4242	1/2:0,196,66:262:99:8107,1791,1159,6111,0,5911

After vt estimate -e AB it looks like this:

21	10906293	rs28616746	T	C,G	-nan	PASS	AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum;HWEAF=0.5,0.5;HWEGF=0,0,0.25,0,0.5,0.25;MLEAF=0,1;AB=0.0985985	GT:AD:DP:GQ:PL	1/2:0,133,52:185:99:5668,1373,975,4289,0,4133	1/2:0,146,63:209:99:6136,1706,1270,4431,0,4242	1/2:0,196,66:262:99:8107,1791,1159,6111,0,5911

AB field is computed fine, but the QUAL is converted to -nan, which is not a valid value for QUAL column and this has issues when I try to run GATK VariantsToTable in a subsequent step.

Since I only saw this with multi-allelic variants, I thought maybe if I decomposed and normalized the variants first that might help. Decomposing and normalizing retains the QUAL score in tact:

bcftools view $vcf 21:10906293 | vt decompose - | vt normalize -r $ref - | grep -v "^#"
21	10906293	rs28616746	T	C	141379	PASS	AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum;OLD_MULTIALLELIC=21:10906293:T/C/G	GT:DP:PL	1/.:185:5668,1373,975	1/.:209:6136,1706,1270	1/.:262:8107,1791,1159
21	10906293	rs28616746	T	G	141379	PASS	AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum;OLD_MULTIALLELIC=21:10906293:T/C/G	GT:DP:PL	./1:185:5668,4289,4133	./1:209:6136,4431,4242	./1:262:8107,6111,5911

However, when I use that as input to vt estimate, the results are worse -- the QUAL score is still converted to either -0 or 5 and AB is now -nan:

21	10906293	rs28616746	T	C	-0	PASS	AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum;OLD_MULTIALLELIC=21:10906293:T/C/G;HWEAF=-nan;HWEGF=-nan,-nan,-nan;MLEAF=-nan;AB=-nan	GT:DP:PL	1/.:185:5668,1373,975	1/.:209:6136,1706,1270	1/.:262:8107,1791,1159
21	10906293	rs28616746	T	G	5	PASS	AC=3,3;AF=0.5,0.5;AN=6;BaseQRankSum=-6.749;ClippingRankSum=0.028;DB;DP=656;ExcessHet=3.1439;FS=2.16;InbreedingCoeff=-0.0625;MQ=56.68;MQRankSum=-2.894;NEGATIVE_TRAIN_SITE;QD=30.86;ReadPosRankSum=1.01;SOR=1.395;VQSLOD=-0.3747;culprit=MQRankSum;OLD_MULTIALLELIC=21:10906293:T/C/G;HWEAF=-nan;HWEGF=-nan,-nan,-nan;MLEAF=-nan;AB=-nan	GT:DP:PL	./1:185:5668,4289,4133	./1:209:6136,4431,4242	./1:262:8107,6111,5911

Any help you can provide or suggestions on how to avoid converting the QUAL score to invalid values?

Thanks!

Andrew

oleraj avatar Oct 22 '18 19:10 oleraj