Negative count in genotype output of Graphtyper genotype_sv
Hi,
I have the following problem with graphtyper v2.7.7. I recalled SVs from a number of individuals. I started by using svimmer to merge SVs from these individuals in a first step, used the output to recall SVs from the raw cram files by chromosome and concatenated the subfiles by using the graphtyper vcf_concatenate -sv command. Now, as a final step, I want to merge the files to create an aggregated VCF file. During that processing, I get the following error:
<error> FATAL ERROR - SNP-HWE: Current genotype configuration (%d %d %d ) includes a negative count-1359751756 332284134 1018685522
Do you know how to fix that error? I have no idea where the negative count comes from.
Help would be appreciated.
Edit:
I checked all the genotypes in the used VCF files. No negative values are present. One suspicion I have, is that the negative value could be caused by an integer overflow. My suspicion is caused by the size of the numbers reported in the error message. Both the first and last number are ints with 10 positions. The error message is emitted by the function p_hwe_excess_het (https://github.com/DecodeGenetics/graphtyper/blob/49643915ed69a20d408d5758afdb62dbd88c4d33/src/utilities/snp_hwe.cpp#L19C1-L32C24), which takes three inputs in int (I assume 32bit) format. If the first value would exceed 2,147,483,647, it would flip to negative values and emit the error message.
My main question would be, why the value could be so high?
As additional information: I am merging 20 VCF files with 100 Individuals each.
Edit2:
I checked the VCF files and found no genotype stats (number of het, homRef, homAlt) either below 0 or over 100 (which is the expected number of individuals).
Likely this is related to the vcf_merge issue in #166 and due to some memory error these integers contain uninitialized or gibberish values.
I wouldn't think it is important but I would rather merge the 1 Mbp graphtyper VCFs first and then concatenate.
Best, Hannes
Hi Hannes,
thank you for your suggestions. I will try to merge the 1 Mbp graphtyper VCFs and check, whether the error persists, or not.
Best,
Sebastian
Hi Hannes,
I tried to merge the 1 Mbp graphtyper VCFs and still get the similar errors. In some cases, I get the same negative value error, in others I get empty logs but exit code 1 with the same set of files. I could only imagine that in some cases the process exits before the log can be written?
Another thought I had: When I merge multiple VCFs, do they all have to have the same number of individuals? Would it cause any problems, if some files have information on more individuals?
Regarding your comment in #166, if graphtyper gets the same input from svimmer, the variants in the outputs should be the same, right?
Best,
Sebastian
Hi, it's fine to have different number of individuals.
Regarding your comment in https://github.com/DecodeGenetics/graphtyper/issues/166, if graphtyper gets the same input from svimmer, the variants in the outputs should be the same, right?
Yes, the variants should be the same (and in the same order).