SV in input VCF missed in graphtyper output VCF
Hi,
I have used manta v1.6.0 for SV discovery and svimmer to create multi-sample VCF as the input for graphtyper v2.7.5 (graphtyper genotype_sv).
I found some SVs in my input VCF file missed in the genotype_sv output VCF file. I've checked the missing SVs. All of them are greater than 50 bp. I've also checked the ALT filed of the missing SVs. Some of them have sequences, while some are <DEL>,<INS>,<DUP:TANDEM>.
Could you suggest why some SVs are missed? Is there any filtering step that graphtyper have done that I may missed? Any insight or help is appreciated. Let me know if you'd like to have a read on the input and output VCF files.
Thank you, Tingting
Hi, sorry for the late response, I have been on a long leave. Could you create an example of input VCF and the commands that you are running?
@hannespetur
I am seeing a similar issue. Some of the svimmer input variants are not found in the graphtyper output.
Below are the list of svimmer input DEL variants that are impacted. The cigars all seem to have a 1M followed by an Insertion and then a Deletion. Most are DELs between 50 and 53 bp. The 30x HGDP+1kg crams were being genotyped with this nextflow script:
NFILES=$(($task.cpus+$task.cpus))
echo "NFILES: $NFILES"
graphtyper genotype_sv $launchDir/ref/Homo_sapiens_assembly38.fasta $svimmer
--sams=${cram_list} --threads=$task.cpus
--max_files_open=$NFILES --verbose --region=$region --output="results"
This was a test run on 10 crams with the chr22 svimmer input based on over 48k crams. There 12,838 DEL in the chr22 svimmer vcf.gz
These are the variants that were missing in the graphtyper output.
chr22 10950658 . AGACCAAAACAAAACAAAAGGCAACATGTGAAGGTACAAAGTGATATATGGAG AAGACCA 0 . END=10950710;SVTYPE=DEL;SVLEN=-52;CIGAR=1M6I52D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 11342195 . CCTTTGAGACAAAACTTCCAGAGGAACGATCAAGCAGCAGCATTTGCACTTCACC CGGAAAAA 0 . END=11342249;SVTYPE=DEL;SVLEN=-54;CIGAR=1M7I54D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 11455434 . TATGAGGGACAAACATTCAGACCACGGGAGCAGTGTTCTGGAATCCTACGT TGA 0 . END=11455484;SVTYPE=DEL;SVLEN=-50;CIGAR=1M2I50D;NUM_MERGED_SVS=3;STDDEV_POS=30.04,3.06
chr22 11460476 . GAGACAAACATTCAGACCACAGCAGGAGTGTTCTGGAGTCCTATGTGAGGG GGT 0 . END=11460526;SVTYPE=DEL;SVLEN=-50;CIGAR=1M2I50D;NUM_MERGED_SVS=2;STDDEV_POS=0.00,0.00
chr22 11901042 . TTTCGTCCATTCATTTGGTGATGGACATGTAGGTTGATTCCATACACAAGC TGGA 0 . END=11901092;SVTYPE=DEL;SVLEN=-50;CIGAR=1M3I50D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 11952808 . CATGATGGAAACCACAAGGCCAGTCCATGACTAGCTACACACATTGACATC CTA 0 . END=11952858;SVTYPE=DEL;SVLEN=-50;CIGAR=1M2I50D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 12199444 . CATTATTCTATCCAAATTGGCTTATCTGTTAACCATTTTAAAGGTATAGGTTTTGGAG CTAGCTGCCTAGCA 0 . END=12199501;SVTYPE=DEL;SVLEN=-57;CIGAR=1M13I57D;NUM_MERGED_SVS=19;STDDEV_POS=0.00,0.00
chr22 15359802 . ACGCGAGGGGCAAATATTCATGACCTCGTAGCAGTGTTCTGGAATCCTATG ATA 0 . END=15359852;SVTYPE=DEL;SVLEN=-50;CIGAR=1M2I50D;NUM_MERGED_SVS=3;STDDEV_POS=0.00,0.00
chr22 15391405 . TCTTATGCGGGGGACAAACACTCAGAACCCAGCAGCAGTGTTCTGGAATCC TT 0 . END=15391455;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 15881986 . GACCCTGGCGTCCCTGTTTCGAGTCCAGTGTGCGCCTAGGGTGGCTAGGGA GG 0 . END=15882036;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 16259516 . TCGATGATGATTCCATTTGAGTCCATTCGATGATTCCATTCGATTCCATGCA TTG 0 . END=16259567;SVTYPE=DEL;SVLEN=-51;CIGAR=1M2I51D;NUM_MERGED_SVS=2;STDDEV_POS=0.00,0.00
chr22 16263962 . TTGATTCCATTCGTTGATGATTCCATTCGAGTCCATTCTCAGATTCCATTA TC 0 . END=16264012;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=165;STDDEV_POS=74.45,27.40
chr22 16306114 . CGATTCCATTTGATGATGATTCTATTTGAGTCCATTCGATGATTCCATTTG CT 0 . END=16306164;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 16326624 . ATCGATTCCATTCAATGATGATTCAGTTCGAGTCCATTCAATGATTCCATTCGATTCCATTCGATGATGATTTCATTCTAGTCCATTCAATGATTCCATTGGATTCCATTCAATGATGATTCCATCCAATGCCATTTGATGATTTCATTCGACTTCGTTTGATGATAATTCCATTCGATTCCACTCNATGATTCCATTGGATTCCATTCAATGATCATTCCTTTCAATTCCAATCGATGTTTCCATTCAATTCATTCGATGATGATTCCATTTGATTCCATTCGATGACTCCATTCGGGTCCGTTCAATTATTCCATTCGATCCCATCCCATGATGATTCCATTCGAGTCCATTCGGTGATGATTCCATTCGATTCAATTCGATGACTCAAT A 0 . END=16327015;SVTYPE=DEL;SVLEN=-391;CIGAR=1M391D;CIPOS=0,18;HOMLEN=18;HOMSEQ=TCGATTCCATTCAATGAT;NUM_MERGED_SVS=2;STDDEV_POS=0.00,0.00
chr22 16906741 . GGGAAATTCCAAAACTTTAGGAAATCTTTCAATTCCCTTTGCCGATCTTTCTTAGATTTGATTTTA GATTAATTTTCATAATTTAAT 0 . END=16906806;SVTYPE=DEL;SVLEN=-65;CIGAR=1M20I65D;NUM_MERGED_SVS=2;STDDEV_POS=0.00,0.00
chr22 17642713 . CCTTCCAAGTAGCTGGGATTATGGGCACACACCACCATACCCAGCTAATTTTTT CTGGGAGAA 0 . END=17642766;SVTYPE=DEL;SVLEN=-53;CIGAR=1M8I53D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 17756434 . TCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTTCTTTCT TTC 0 . END=17756485;SVTYPE=DEL;SVLEN=-51;CIGAR=1M2I51D;NUM_MERGED_SVS=27724;STDDEV_POS=0.36,0.75
chr22 19002491 . TTGTGACACTTGACTAGTTTATGAGAGCAGAAGCTGTTACGTGACACTTAGCACATACTGC TCTCTGACTACTCACAGTCTGCA 0 . END=19002551;SVTYPE=DEL;SVLEN=-60;CIGAR=1M22I60D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 20555318 . CGTTGAGCTCCCAGAAGGGTTAAGTGATGCTGGGCCCTCCTCCTCCTCCTAG CTGCCCCCACCA 0 . END=20555369;SVTYPE=DEL;SVLEN=-51;CIGAR=1M11I51D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 21169457 . GAAAAGAAATAAATACAATTAATGCTGGTGCATGGTATTAAATCTAGTTTTT GGCAG 0 . END=21169508;SVTYPE=DEL;SVLEN=-51;CIGAR=1M4I51D;NUM_MERGED_SVS=2;STDDEV_POS=0.00,0.00
chr22 21327704 . ATATATGATTTATCATCTATATCAGCTATGATATATCATCTATATCATATA AGG 0 . END=21327754;SVTYPE=DEL;SVLEN=-50;CIGAR=1M2I50D;NUM_MERGED_SVS=128;STDDEV_POS=69.91,64.52
chr22 21822886 . AAACATTTGAATGTTAAAGTTAATTTTATTTATCAAATAATCACCTACATTATGT ATAAGCTTAAATAAA 0 . END=21822940;SVTYPE=DEL;SVLEN=-54;CIGAR=1M14I54D;NUM_MERGED_SVS=2;STDDEV_POS=0.00,0.00
chr22 23836990 . AGGCCAGCACAGGTCCCCATCGGTGGGGATCCTTCTGAGGGTGGGGAGAGG ACGTGT 0 . END=23837040;SVTYPE=DEL;SVLEN=-50;CIGAR=1M5I50D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 23969006 . GAAAACTGTTACTCTAACAACAAGTGTTATACACTTACCATGTGCTAGGTCCTCTACAGGTACTTTACACTCATGATCCCATTTGATCCTTACAATCCCTATC GCTTACTGAATGTCTAAAAAAACAAGTTTAAACTGTTTGTTACCCAAAGTTTGGTG 0 . END=23969108;SVTYPE=DEL;SVLEN=-102;CIGAR=1M55I102D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 24660986 . CCTCGTCACACACACACACACACACACACACACACACACACACACACACACAC CTCAGTCTCCAGATAAAA 0 . END=24661038;SVTYPE=DEL;SVLEN=-52;CIGAR=1M17I52D;NUM_MERGED_SVS=8;STDDEV_POS=0.00,0.00
chr22 24783358 . TGGAGACAACATGTAGTTGGATCATGTTTTGTTATCCATCCACTCTCCCTTGAACAACTGAACAA TTTTTTCTTTTTTCTTTTTTTCTTT 0 . END=24783422;SVTYPE=DEL;SVLEN=-64;CIGAR=1M24I64D;NUM_MERGED_SVS=2061;STDDEV_POS=3.00,3.11
chr22 24967152 . CCAGCCTTTTAAAAGACAGGGCCTAGAAAAATCACAATTTGCTGACAGGGCC CAGCCT 0 . END=24967203;SVTYPE=DEL;SVLEN=-51;CIGAR=1M5I51D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 25157577 . AACTGTAGAGCTTATATTAACAGAAATTCTGAGTTAAAAAGAACATCAAATTTGTCTATCTCCATG ATGAGTGCCACAGAGTGCCACTC 0 . END=25157642;SVTYPE=DEL;SVLEN=-65;CIGAR=1M22I65D;NUM_MERGED_SVS=3;STDDEV_POS=0.00,0.00
chr22 25849484 . TGCATATATACACATGCATGTGTGATGCATACTCATGCATGCTATTGAGTACC TATAG 0 . END=25849536;SVTYPE=DEL;SVLEN=-52;CIGAR=1M4I52D;NUM_MERGED_SVS=2;STDDEV_POS=0.00,0.00
chr22 26691835 . CTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTT CC 0 . END=26691885;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=3;STDDEV_POS=9.24,8.66
chr22 28032218 . AAAATATATATATATAAAATATATATATATAAAATATATATATATATATAGTGT ATATA 0 . END=28032271;SVTYPE=DEL;SVLEN=-53;CIGAR=1M4I53D;NUM_MERGED_SVS=478;STDDEV_POS=19.29,14.97
chr22 28591029 . ACACACACACACATATATATATATATATATATATATATATATATATATATAT ATTG 0 . END=28591080;SVTYPE=DEL;SVLEN=-51;CIGAR=1M3I51D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 29252100 . TGCCTAGCGAAGGTCATTCATTTTTAGATCCTGCCCCTGTAATACTCGAAAGGGGATTACTTTGGCATG TCCGCATCACATGGATCGGGTGACCCT 0 . END=29252168;SVTYPE=DEL;SVLEN=-68;CIGAR=1M26I68D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 29404642 . TTAGTTTATTAAAGTAGTTAAGCCTCAGGATTAAAACAGTAACATTAGATAATGAGAAATAAAATG TGTACTGAGTACAAGTACT 0 . END=29404707;SVTYPE=DEL;SVLEN=-65;CIGAR=1M18I65D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 29643962 . CGGGGGGCTGACCCCCCCACCTCCCTCCCGGACGGGGCGGCTGGCCTGGCC CT 0 . END=29644012;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=58;STDDEV_POS=8.36,4.50
chr22 31189307 . TGTGGAGACAGCCTTGCTCTCTTGCCCAGGCTGGAGTACAGTGGTGCAGTCTTGGCTCACTTGCAACCTCTGCCTCCTGGGCTCAAGTGGTTCTCCTGTCTCAGCCTC TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATTTTTTTTTTTTTTTTTTTTTTTTTATTTTTCCT 0 . END=31189414;SVTYPE=DEL;SVLEN=-107;CIGAR=1M73I107D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 32415116 . AAGGAGCGTGCAACCTAAATCCCTTGCACAGGCAGTTCACAATAGGGTTTGTGCTCC AGGAGCACAAACCCTATTGTGAACTGCCTGTGCAAGGGATTTAGGTTGCACGCTCCT 0 . END=32415172;SVTYPE=DEL;SVLEN=-56;CIGAR=1M56I56D;NUM_MERGED_SVS=2;STDDEV_POS=0.00,0.00
chr22 33591062 . GTGGTGGTGGGTGCCTGTAATTCCAGCTATTCGGGAGGCTGAGGCAGAAGAAT GATTAGTGG 0 . END=33591114;SVTYPE=DEL;SVLEN=-52;CIGAR=1M8I52D;NUM_MERGED_SVS=3;STDDEV_POS=0.00,0.00
chr22 34278979 . AAACATATATATATAATATATATAATATATAATATATATAAAATATATATA AT 0 . END=34279029;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=12856;STDDEV_POS=19.14,21.78
chr22 36538143 . GCTGGGATTACAGGTGTGAGCCACCGCAACTGGCCCATTGGCCTTTCTTGTTGTACTGTTCTGTCCCTTCCAGGTAAGACAGGTACATTTTC GGATGGATCACTGGAGGTCAGGAGTTTGAGACCAGCCTGGCCAACATGATGAATCCCCGTCTCTACTAAAAA 0 . END=36538234;SVTYPE=DEL;SVLEN=-91;CIGAR=1M71I91D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 36751263 . TACATATATGATTTATATATCATATATATGATATATATGATTTATATATGA TT 0 . END=36751313;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=484;STDDEV_POS=4.91,1.37
chr22 36751599 . TCATATATGTCATATATATCATATATATCATATATATATCATATATATCAT TATC 0 . END=36751649;SVTYPE=DEL;SVLEN=-50;CIGAR=1M3I50D;NUM_MERGED_SVS=8;STDDEV_POS=50.72,20.32
chr22 36872621 . GATTGGAGGTGAGGGTGGAGGTAAGAGTGGAGATGAGATTGGAGGTGAGGA GTT 0 . END=36872671;SVTYPE=DEL;SVLEN=-50;CIGAR=1M2I50D;NUM_MERGED_SVS=42;STDDEV_POS=35.50,15.34
chr22 37434686 . ACGGTCCCTGGGGAGGGGGATGCATTGTATATATTCCCAAGACTCCATGGCAAAGGGAGGGTTT ATTCTACTGCTACCA 0 . END=37434749;SVTYPE=DEL;SVLEN=-63;CIGAR=1M14I63D;NUM_MERGED_SVS=2;STDDEV_POS=0.00,0.00
chr22 38083742 . AGGAGGGTGTACTCAGAGACAGGTGCACCAGGAGCCGGGGGCTGGGGATAG ACGGCGCTCCTGC 0 . END=38083792;SVTYPE=DEL;SVLEN=-50;CIGAR=1M12I50D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 38446974 . ACAGAGGCAAACGCCCACCCTTACGGGGCAGGGGGTAAAGGCTCAGAGAGGT AAAAAAAAAAAAAAAAA 0 . END=38447025;SVTYPE=DEL;SVLEN=-51;CIGAR=1M16I51D;NUM_MERGED_SVS=12;STDDEV_POS=0.00,0.00
chr22 38791724 . TCTGTTGCCCAGGCTGGAGTGCAGTAGTGTGATCTCAGCTCAATGCAACATCCA TAAAC 0 . END=38791777;SVTYPE=DEL;SVLEN=-53;CIGAR=1M4I53D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 41175788 . TTACTGAATGAAGGAGAAAAAAATCACAGGATGTGCATTTCAGTTCTATTTA TCC 0 . END=41175839;SVTYPE=DEL;SVLEN=-51;CIGAR=1M2I51D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 42566508 . ATAGCACCACTGCACTGCAGCCTGGGCGACAGAGCGAGACTCCATCTCAAG AAGA 0 . END=42566558;SVTYPE=DEL;SVLEN=-50;CIGAR=1M3I50D;NUM_MERGED_SVS=15;STDDEV_POS=0.00,0.00
chr22 42858334 . TCCCAGTTCAGCCAGCTGCTTCCTAGCTCTGTGGCCTTGGTCAAGACACTT TAA 0 . END=42858384;SVTYPE=DEL;SVLEN=-50;CIGAR=1M2I50D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 44335829 . GTGTGCATGCATGTGTGATGTGTGTCTGTGTATGTGTGGTAAGTGTGGTGT GC 0 . END=44335879;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22 46977396 . TTGGGGTGAGTCAGCATCACCCCTCTGTCCCCAAGAAGCTCAGAGCCTGGTGGGATGGAGCA TCTCTGAGTGGGGGAGGACAGCTGAT 0 . END=46977457;SVTYPE=DEL;SVLEN=-61;CIGAR=1M25I61D;NUM_MERGED_SVS=2;STDDEV_POS=0.00,0.00
chr22 47142380 . CGCCCACCCATTCTTCCATCCATCCTGCCACCCACCCATCCATTCACCCAC CTG 0 . END=47142430;SVTYPE=DEL;SVLEN=-50;CIGAR=1M2I50D;NUM_MERGED_SVS=6;STDDEV_POS=34.93,77.57
chr22 47343152 . TCTATGCCCTACCCAATCCTGTCCTACCCAATCCCTGTCCTACCCAATCTA TTCC 0 . END=47343202;SVTYPE=DEL;SVLEN=-50;CIGAR=1M3I50D;NUM_MERGED_SVS=14688;STDDEV_POS=53.87,12.85
chr22 49835711 . GTAGTGGCAAAAAATAAATAAATAAATAAGAATAAATAATAGGCCGGGTGT GC 0 . END=49835761;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=21;STDDEV_POS=16.23,15.49
chr22 50073047 . GT GCTCCCGGGCAGGCGTGGGCCCCTTCTCGGCAGTCCACCCGGCCACACTGG 0 . END=50073048;SVTYPE=INS;SVLEN=50;CIGAR=1M50I1D;NUM_MERGED_SVS=18;STDDEV_POS=10.11,10.11
chr22 50073047 . GTTCCCGGGCAGGCGTGGGCCCCTTCTCCGCAGTCCACCCGGCCATACCAT GC 0 . END=50073097;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=4;STDDEV_POS=21.50,2.50
chr22 50383788 . GCACTTTGGGAGGCTGAGGTGGGCGGATCACCTGAGGTCAGGAGTTCAAGA GT 0 . END=50383838;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
These are the 10 crams from 1000 genomes that were genotyped.
/restricted/projectnb/casa/hgdp_1kg/3202g/cram/HG00096.final.cram /restricted/projectnb/casa/hgdp_1kg/3202g/cram/HG00097.final.cram /restricted/projectnb/casa/hgdp_1kg/3202g/cram/HG00099.final.cram /restricted/projectnb/casa/hgdp_1kg/3202g/cram/HG00100.final.cram /restricted/projectnb/casa/hgdp_1kg/3202g/cram/HG00101.final.cram /restricted/projectnb/casa/hgdp_1kg/3202g/cram/HG00102.final.cram /restricted/projectnb/casa/hgdp_1kg/3202g/cram/HG00103.final.cram /restricted/projectnb/casa/hgdp_1kg/3202g/cram/HG00105.final.cram /restricted/projectnb/casa/hgdp_1kg/3202g/cram/HG00106.final.cram /restricted/projectnb/casa/hgdp_1kg/3202g/cram/HG00107.final.cram