graphtyper icon indicating copy to clipboard operation
graphtyper copied to clipboard

SV in input VCF missed in graphtyper output VCF

Open tgong1 opened this issue 3 years ago • 2 comments

Hi,

I have used manta v1.6.0 for SV discovery and svimmer to create multi-sample VCF as the input for graphtyper v2.7.5 (graphtyper genotype_sv). I found some SVs in my input VCF file missed in the genotype_sv output VCF file. I've checked the missing SVs. All of them are greater than 50 bp. I've also checked the ALT filed of the missing SVs. Some of them have sequences, while some are <DEL>,<INS>,<DUP:TANDEM>. Could you suggest why some SVs are missed? Is there any filtering step that graphtyper have done that I may missed? Any insight or help is appreciated. Let me know if you'd like to have a read on the input and output VCF files.

Thank you, Tingting

tgong1 avatar Oct 19 '22 02:10 tgong1

Hi, sorry for the late response, I have been on a long leave. Could you create an example of input VCF and the commands that you are running?

hannespetur avatar Dec 01 '22 13:12 hannespetur

@hannespetur

I am seeing a similar issue. Some of the svimmer input variants are not found in the graphtyper output.

Below are the list of svimmer input DEL variants that are impacted. The cigars all seem to have a 1M followed by an Insertion and then a Deletion. Most are DELs between 50 and 53 bp. The 30x HGDP+1kg crams were being genotyped with this nextflow script:

NFILES=$(($task.cpus+$task.cpus)) echo "NFILES: $NFILES" graphtyper genotype_sv $launchDir/ref/Homo_sapiens_assembly38.fasta $svimmer
--sams=${cram_list} --threads=$task.cpus
--max_files_open=$NFILES --verbose --region=$region --output="results"

This was a test run on 10 crams with the chr22 svimmer input based on over 48k crams. There 12,838 DEL in the chr22 svimmer vcf.gz

These are the variants that were missing in the graphtyper output.

chr22   10950658        .       AGACCAAAACAAAACAAAAGGCAACATGTGAAGGTACAAAGTGATATATGGAG   AAGACCA 0       .       END=10950710;SVTYPE=DEL;SVLEN=-52;CIGAR=1M6I52D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   11342195        .       CCTTTGAGACAAAACTTCCAGAGGAACGATCAAGCAGCAGCATTTGCACTTCACC CGGAAAAA        0       .       END=11342249;SVTYPE=DEL;SVLEN=-54;CIGAR=1M7I54D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   11455434        .       TATGAGGGACAAACATTCAGACCACGGGAGCAGTGTTCTGGAATCCTACGT     TGA     0       .       END=11455484;SVTYPE=DEL;SVLEN=-50;CIGAR=1M2I50D;NUM_MERGED_SVS=3;STDDEV_POS=30.04,3.06
chr22   11460476        .       GAGACAAACATTCAGACCACAGCAGGAGTGTTCTGGAGTCCTATGTGAGGG     GGT     0       .       END=11460526;SVTYPE=DEL;SVLEN=-50;CIGAR=1M2I50D;NUM_MERGED_SVS=2;STDDEV_POS=0.00,0.00
chr22   11901042        .       TTTCGTCCATTCATTTGGTGATGGACATGTAGGTTGATTCCATACACAAGC     TGGA    0       .       END=11901092;SVTYPE=DEL;SVLEN=-50;CIGAR=1M3I50D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   11952808        .       CATGATGGAAACCACAAGGCCAGTCCATGACTAGCTACACACATTGACATC     CTA     0       .       END=11952858;SVTYPE=DEL;SVLEN=-50;CIGAR=1M2I50D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   12199444        .       CATTATTCTATCCAAATTGGCTTATCTGTTAACCATTTTAAAGGTATAGGTTTTGGAG      CTAGCTGCCTAGCA  0       .       END=12199501;SVTYPE=DEL;SVLEN=-57;CIGAR=1M13I57D;NUM_MERGED_SVS=19;STDDEV_POS=0.00,0.00
chr22   15359802        .       ACGCGAGGGGCAAATATTCATGACCTCGTAGCAGTGTTCTGGAATCCTATG     ATA     0       .       END=15359852;SVTYPE=DEL;SVLEN=-50;CIGAR=1M2I50D;NUM_MERGED_SVS=3;STDDEV_POS=0.00,0.00
chr22   15391405        .       TCTTATGCGGGGGACAAACACTCAGAACCCAGCAGCAGTGTTCTGGAATCC     TT      0       .       END=15391455;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   15881986        .       GACCCTGGCGTCCCTGTTTCGAGTCCAGTGTGCGCCTAGGGTGGCTAGGGA     GG      0       .       END=15882036;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   16259516        .       TCGATGATGATTCCATTTGAGTCCATTCGATGATTCCATTCGATTCCATGCA    TTG     0       .       END=16259567;SVTYPE=DEL;SVLEN=-51;CIGAR=1M2I51D;NUM_MERGED_SVS=2;STDDEV_POS=0.00,0.00
chr22   16263962        .       TTGATTCCATTCGTTGATGATTCCATTCGAGTCCATTCTCAGATTCCATTA     TC      0       .       END=16264012;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=165;STDDEV_POS=74.45,27.40
chr22   16306114        .       CGATTCCATTTGATGATGATTCTATTTGAGTCCATTCGATGATTCCATTTG     CT      0       .       END=16306164;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   16326624        .       ATCGATTCCATTCAATGATGATTCAGTTCGAGTCCATTCAATGATTCCATTCGATTCCATTCGATGATGATTTCATTCTAGTCCATTCAATGATTCCATTGGATTCCATTCAATGATGATTCCATCCAATGCCATTTGATGATTTCATTCGACTTCGTTTGATGATAATTCCATTCGATTCCACTCNATGATTCCATTGGATTCCATTCAATGATCATTCCTTTCAATTCCAATCGATGTTTCCATTCAATTCATTCGATGATGATTCCATTTGATTCCATTCGATGACTCCATTCGGGTCCGTTCAATTATTCCATTCGATCCCATCCCATGATGATTCCATTCGAGTCCATTCGGTGATGATTCCATTCGATTCAATTCGATGACTCAAT  A       0       .       END=16327015;SVTYPE=DEL;SVLEN=-391;CIGAR=1M391D;CIPOS=0,18;HOMLEN=18;HOMSEQ=TCGATTCCATTCAATGAT;NUM_MERGED_SVS=2;STDDEV_POS=0.00,0.00
chr22   16906741        .       GGGAAATTCCAAAACTTTAGGAAATCTTTCAATTCCCTTTGCCGATCTTTCTTAGATTTGATTTTA      GATTAATTTTCATAATTTAAT   0       .       END=16906806;SVTYPE=DEL;SVLEN=-65;CIGAR=1M20I65D;NUM_MERGED_SVS=2;STDDEV_POS=0.00,0.00
chr22   17642713        .       CCTTCCAAGTAGCTGGGATTATGGGCACACACCACCATACCCAGCTAATTTTTT  CTGGGAGAA       0       .       END=17642766;SVTYPE=DEL;SVLEN=-53;CIGAR=1M8I53D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   17756434        .       TCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTTCTTTCT    TTC     0       .       END=17756485;SVTYPE=DEL;SVLEN=-51;CIGAR=1M2I51D;NUM_MERGED_SVS=27724;STDDEV_POS=0.36,0.75
chr22   19002491        .       TTGTGACACTTGACTAGTTTATGAGAGCAGAAGCTGTTACGTGACACTTAGCACATACTGC   TCTCTGACTACTCACAGTCTGCA 0       .       END=19002551;SVTYPE=DEL;SVLEN=-60;CIGAR=1M22I60D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   20555318        .       CGTTGAGCTCCCAGAAGGGTTAAGTGATGCTGGGCCCTCCTCCTCCTCCTAG    CTGCCCCCACCA    0       .       END=20555369;SVTYPE=DEL;SVLEN=-51;CIGAR=1M11I51D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   21169457        .       GAAAAGAAATAAATACAATTAATGCTGGTGCATGGTATTAAATCTAGTTTTT    GGCAG   0       .       END=21169508;SVTYPE=DEL;SVLEN=-51;CIGAR=1M4I51D;NUM_MERGED_SVS=2;STDDEV_POS=0.00,0.00
chr22   21327704        .       ATATATGATTTATCATCTATATCAGCTATGATATATCATCTATATCATATA     AGG     0       .       END=21327754;SVTYPE=DEL;SVLEN=-50;CIGAR=1M2I50D;NUM_MERGED_SVS=128;STDDEV_POS=69.91,64.52
chr22   21822886        .       AAACATTTGAATGTTAAAGTTAATTTTATTTATCAAATAATCACCTACATTATGT ATAAGCTTAAATAAA 0       .       END=21822940;SVTYPE=DEL;SVLEN=-54;CIGAR=1M14I54D;NUM_MERGED_SVS=2;STDDEV_POS=0.00,0.00
chr22   23836990        .       AGGCCAGCACAGGTCCCCATCGGTGGGGATCCTTCTGAGGGTGGGGAGAGG     ACGTGT  0       .       END=23837040;SVTYPE=DEL;SVLEN=-50;CIGAR=1M5I50D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   23969006        .       GAAAACTGTTACTCTAACAACAAGTGTTATACACTTACCATGTGCTAGGTCCTCTACAGGTACTTTACACTCATGATCCCATTTGATCCTTACAATCCCTATC GCTTACTGAATGTCTAAAAAAACAAGTTTAAACTGTTTGTTACCCAAAGTTTGGTG        0       .       END=23969108;SVTYPE=DEL;SVLEN=-102;CIGAR=1M55I102D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   24660986        .       CCTCGTCACACACACACACACACACACACACACACACACACACACACACACAC   CTCAGTCTCCAGATAAAA      0       .       END=24661038;SVTYPE=DEL;SVLEN=-52;CIGAR=1M17I52D;NUM_MERGED_SVS=8;STDDEV_POS=0.00,0.00
chr22   24783358        .       TGGAGACAACATGTAGTTGGATCATGTTTTGTTATCCATCCACTCTCCCTTGAACAACTGAACAA       TTTTTTCTTTTTTCTTTTTTTCTTT       0       .       END=24783422;SVTYPE=DEL;SVLEN=-64;CIGAR=1M24I64D;NUM_MERGED_SVS=2061;STDDEV_POS=3.00,3.11
chr22   24967152        .       CCAGCCTTTTAAAAGACAGGGCCTAGAAAAATCACAATTTGCTGACAGGGCC    CAGCCT  0       .       END=24967203;SVTYPE=DEL;SVLEN=-51;CIGAR=1M5I51D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   25157577        .       AACTGTAGAGCTTATATTAACAGAAATTCTGAGTTAAAAAGAACATCAAATTTGTCTATCTCCATG      ATGAGTGCCACAGAGTGCCACTC 0       .       END=25157642;SVTYPE=DEL;SVLEN=-65;CIGAR=1M22I65D;NUM_MERGED_SVS=3;STDDEV_POS=0.00,0.00
chr22   25849484        .       TGCATATATACACATGCATGTGTGATGCATACTCATGCATGCTATTGAGTACC   TATAG   0       .       END=25849536;SVTYPE=DEL;SVLEN=-52;CIGAR=1M4I52D;NUM_MERGED_SVS=2;STDDEV_POS=0.00,0.00
chr22   26691835        .       CTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTT     CC      0       .       END=26691885;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=3;STDDEV_POS=9.24,8.66
chr22   28032218        .       AAAATATATATATATAAAATATATATATATAAAATATATATATATATATAGTGT  ATATA   0       .       END=28032271;SVTYPE=DEL;SVLEN=-53;CIGAR=1M4I53D;NUM_MERGED_SVS=478;STDDEV_POS=19.29,14.97
chr22   28591029        .       ACACACACACACATATATATATATATATATATATATATATATATATATATAT    ATTG    0       .       END=28591080;SVTYPE=DEL;SVLEN=-51;CIGAR=1M3I51D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   29252100        .       TGCCTAGCGAAGGTCATTCATTTTTAGATCCTGCCCCTGTAATACTCGAAAGGGGATTACTTTGGCATG   TCCGCATCACATGGATCGGGTGACCCT     0       .       END=29252168;SVTYPE=DEL;SVLEN=-68;CIGAR=1M26I68D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   29404642        .       TTAGTTTATTAAAGTAGTTAAGCCTCAGGATTAAAACAGTAACATTAGATAATGAGAAATAAAATG      TGTACTGAGTACAAGTACT     0       .       END=29404707;SVTYPE=DEL;SVLEN=-65;CIGAR=1M18I65D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   29643962        .       CGGGGGGCTGACCCCCCCACCTCCCTCCCGGACGGGGCGGCTGGCCTGGCC     CT      0       .       END=29644012;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=58;STDDEV_POS=8.36,4.50
chr22   31189307        .       TGTGGAGACAGCCTTGCTCTCTTGCCCAGGCTGGAGTACAGTGGTGCAGTCTTGGCTCACTTGCAACCTCTGCCTCCTGGGCTCAAGTGGTTCTCCTGTCTCAGCCTC    TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATTTTTTTTTTTTTTTTTTTTTTTTTATTTTTCCT      0       .       END=31189414;SVTYPE=DEL;SVLEN=-107;CIGAR=1M73I107D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   32415116        .       AAGGAGCGTGCAACCTAAATCCCTTGCACAGGCAGTTCACAATAGGGTTTGTGCTCC       AGGAGCACAAACCCTATTGTGAACTGCCTGTGCAAGGGATTTAGGTTGCACGCTCCT       0       .       END=32415172;SVTYPE=DEL;SVLEN=-56;CIGAR=1M56I56D;NUM_MERGED_SVS=2;STDDEV_POS=0.00,0.00
chr22   33591062        .       GTGGTGGTGGGTGCCTGTAATTCCAGCTATTCGGGAGGCTGAGGCAGAAGAAT   GATTAGTGG       0       .       END=33591114;SVTYPE=DEL;SVLEN=-52;CIGAR=1M8I52D;NUM_MERGED_SVS=3;STDDEV_POS=0.00,0.00
chr22   34278979        .       AAACATATATATATAATATATATAATATATAATATATATAAAATATATATA     AT      0       .       END=34279029;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=12856;STDDEV_POS=19.14,21.78
chr22   36538143        .       GCTGGGATTACAGGTGTGAGCCACCGCAACTGGCCCATTGGCCTTTCTTGTTGTACTGTTCTGTCCCTTCCAGGTAAGACAGGTACATTTTC    GGATGGATCACTGGAGGTCAGGAGTTTGAGACCAGCCTGGCCAACATGATGAATCCCCGTCTCTACTAAAAA        0       .       END=36538234;SVTYPE=DEL;SVLEN=-91;CIGAR=1M71I91D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   36751263        .       TACATATATGATTTATATATCATATATATGATATATATGATTTATATATGA     TT      0       .       END=36751313;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=484;STDDEV_POS=4.91,1.37
chr22   36751599        .       TCATATATGTCATATATATCATATATATCATATATATATCATATATATCAT     TATC    0       .       END=36751649;SVTYPE=DEL;SVLEN=-50;CIGAR=1M3I50D;NUM_MERGED_SVS=8;STDDEV_POS=50.72,20.32
chr22   36872621        .       GATTGGAGGTGAGGGTGGAGGTAAGAGTGGAGATGAGATTGGAGGTGAGGA     GTT     0       .       END=36872671;SVTYPE=DEL;SVLEN=-50;CIGAR=1M2I50D;NUM_MERGED_SVS=42;STDDEV_POS=35.50,15.34
chr22   37434686        .       ACGGTCCCTGGGGAGGGGGATGCATTGTATATATTCCCAAGACTCCATGGCAAAGGGAGGGTTT        ATTCTACTGCTACCA 0       .       END=37434749;SVTYPE=DEL;SVLEN=-63;CIGAR=1M14I63D;NUM_MERGED_SVS=2;STDDEV_POS=0.00,0.00
chr22   38083742        .       AGGAGGGTGTACTCAGAGACAGGTGCACCAGGAGCCGGGGGCTGGGGATAG     ACGGCGCTCCTGC   0       .       END=38083792;SVTYPE=DEL;SVLEN=-50;CIGAR=1M12I50D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   38446974        .       ACAGAGGCAAACGCCCACCCTTACGGGGCAGGGGGTAAAGGCTCAGAGAGGT    AAAAAAAAAAAAAAAAA       0       .       END=38447025;SVTYPE=DEL;SVLEN=-51;CIGAR=1M16I51D;NUM_MERGED_SVS=12;STDDEV_POS=0.00,0.00
chr22   38791724        .       TCTGTTGCCCAGGCTGGAGTGCAGTAGTGTGATCTCAGCTCAATGCAACATCCA  TAAAC   0       .       END=38791777;SVTYPE=DEL;SVLEN=-53;CIGAR=1M4I53D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   41175788        .       TTACTGAATGAAGGAGAAAAAAATCACAGGATGTGCATTTCAGTTCTATTTA    TCC     0       .       END=41175839;SVTYPE=DEL;SVLEN=-51;CIGAR=1M2I51D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   42566508        .       ATAGCACCACTGCACTGCAGCCTGGGCGACAGAGCGAGACTCCATCTCAAG     AAGA    0       .       END=42566558;SVTYPE=DEL;SVLEN=-50;CIGAR=1M3I50D;NUM_MERGED_SVS=15;STDDEV_POS=0.00,0.00
chr22   42858334        .       TCCCAGTTCAGCCAGCTGCTTCCTAGCTCTGTGGCCTTGGTCAAGACACTT     TAA     0       .       END=42858384;SVTYPE=DEL;SVLEN=-50;CIGAR=1M2I50D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   44335829        .       GTGTGCATGCATGTGTGATGTGTGTCTGTGTATGTGTGGTAAGTGTGGTGT     GC      0       .       END=44335879;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00
chr22   46977396        .       TTGGGGTGAGTCAGCATCACCCCTCTGTCCCCAAGAAGCTCAGAGCCTGGTGGGATGGAGCA  TCTCTGAGTGGGGGAGGACAGCTGAT      0       .       END=46977457;SVTYPE=DEL;SVLEN=-61;CIGAR=1M25I61D;NUM_MERGED_SVS=2;STDDEV_POS=0.00,0.00
chr22   47142380        .       CGCCCACCCATTCTTCCATCCATCCTGCCACCCACCCATCCATTCACCCAC     CTG     0       .       END=47142430;SVTYPE=DEL;SVLEN=-50;CIGAR=1M2I50D;NUM_MERGED_SVS=6;STDDEV_POS=34.93,77.57
chr22   47343152        .       TCTATGCCCTACCCAATCCTGTCCTACCCAATCCCTGTCCTACCCAATCTA     TTCC    0       .       END=47343202;SVTYPE=DEL;SVLEN=-50;CIGAR=1M3I50D;NUM_MERGED_SVS=14688;STDDEV_POS=53.87,12.85
chr22   49835711        .       GTAGTGGCAAAAAATAAATAAATAAATAAGAATAAATAATAGGCCGGGTGT     GC      0       .       END=49835761;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=21;STDDEV_POS=16.23,15.49
chr22   50073047        .       GT      GCTCCCGGGCAGGCGTGGGCCCCTTCTCGGCAGTCCACCCGGCCACACTGG     0       .       END=50073048;SVTYPE=INS;SVLEN=50;CIGAR=1M50I1D;NUM_MERGED_SVS=18;STDDEV_POS=10.11,10.11
chr22   50073047        .       GTTCCCGGGCAGGCGTGGGCCCCTTCTCCGCAGTCCACCCGGCCATACCAT     GC      0       .       END=50073097;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=4;STDDEV_POS=21.50,2.50
chr22   50383788        .       GCACTTTGGGAGGCTGAGGTGGGCGGATCACCTGAGGTCAGGAGTTCAAGA     GT      0       .       END=50383838;SVTYPE=DEL;SVLEN=-50;CIGAR=1M1I50D;NUM_MERGED_SVS=1;STDDEV_POS=0.00,0.00

These are the 10 crams from 1000 genomes that were genotyped.

/restricted/projectnb/casa/hgdp_1kg/3202g/cram/HG00096.final.cram /restricted/projectnb/casa/hgdp_1kg/3202g/cram/HG00097.final.cram /restricted/projectnb/casa/hgdp_1kg/3202g/cram/HG00099.final.cram /restricted/projectnb/casa/hgdp_1kg/3202g/cram/HG00100.final.cram /restricted/projectnb/casa/hgdp_1kg/3202g/cram/HG00101.final.cram /restricted/projectnb/casa/hgdp_1kg/3202g/cram/HG00102.final.cram /restricted/projectnb/casa/hgdp_1kg/3202g/cram/HG00103.final.cram /restricted/projectnb/casa/hgdp_1kg/3202g/cram/HG00105.final.cram /restricted/projectnb/casa/hgdp_1kg/3202g/cram/HG00106.final.cram /restricted/projectnb/casa/hgdp_1kg/3202g/cram/HG00107.final.cram

jjfarrell avatar Aug 14 '23 22:08 jjfarrell