lobstr-code icon indicating copy to clipboard operation
lobstr-code copied to clipboard

Starting with comma in alt field

Open olekto opened this issue 10 years ago • 0 comments

Hi, I was trying to filter a vcf created with the 4.0 beta version, when the filtering script crashed with this error: Traceback (most recent call last): File "/projects/cees/bin/lobstr/lobSTR-bin-Linux-x86_64-4.0.0/share/lobSTR/scripts/lobSTR_filter_vcf.py", line 132, in for record in reader: File "/cluster/software/VERSIONS/python_packages-2.7_5/cluster/software/VERSIONS/python2-2.7.10/lib/python2.7/site-packages/vcf/parser.py", line 539, in next alt = self._map(self._parse_alt, row[4].split(',')) File "/cluster/software/VERSIONS/python_packages-2.7_5/cluster/software/VERSIONS/python2-2.7.10/lib/python2.7/site-packages/vcf/parser.py", line 347, in _map for x in iterable] File "/cluster/software/VERSIONS/python_packages-2.7_5/cluster/software/VERSIONS/python2-2.7.10/lib/python2.7/site-packages/vcf/parser.py", line 515, in _parse_alt elif str[0] == '.' and len(str) > 1: IndexError: string index out of range

The following entry is likely the culprit: LG15 1591293 . AAATAAAAATAAAATAAAA ,AAATAAAA,AAATAAAAATAAAATAAAAAAAATAAAAT 669.516 . END=1591311;MOTIF=AAATA;NS=10;REF=3.8;RL=19;RU=AAATA;VT=STR;RPA=0,1.6,5.8 GT:ALLREADS:AML:DISTENDS:DP:GB:PL:Q:SB:STITCH 0/3:0|1;10|1:0.980102/0.961652:10:2:0/10:16,22,191,22,125,122,0,26,24,20:0.941778:2:0 0/3:0|2;10|3:0.999776/0.999994:28:5:0/10:52,67,478,67,347,341,0,52,49,37:0.999769:1.44444:0 0/3:0|1;10|1:0.980102/0.961652:51:2:0/10:16,22,191,22,125,122,0,26,24,20:0.941778:4.25:0 3/3:10|2:0.999969/0.999969:67:2:10/10:44,50,197,50,197,197,5,6,6,0:0.499146:2:0 0/3:0|4;10|2:1/0.997848:41.6667:6:0/10:26,44,574,44,311,299,0,105,99,87:0.997848:26:0 2/3:-11|4;-1|1;10|2:0.999984/0.999784:8:7:-11/10:165,149,354,44,191,176,140,112,0,385:0.999784:26:0 1/0:-19|1;0|7;10|1:0.998671/0.999881:-3.875:9:-19/0:71,0,740,29,283,288,73,161,180,233:0.998671:4.25:0 0/0:0|1:0.993932/0.993932:-28:1:0/0:0,3,98,3,33,30,3,29,27,26:0.330906:5:0 3/3:10|4:1/1:1.5:4:10/10:89,101,395,101,395,395,11,12,12,0:0.798921:5:0 0/3:0|2;10|1:0.999907/0.940512:6.66667:3:0/10:13,22,287,22,155,149,0,52,49,43:0.940419:9.25:0

The ALT field starts with comma, which I don't think is according to specs. allelotype was run on a set of BWA aligned bams, like this: allelotype --command classify --bam bam1,bam2,bam3,bam4,bam5,bam6,bam7,bam8,bam9,bam10 --strinfo genome_strinfo.tab --noise_model /projects/cees/bin/lobstr/lobSTR-bin-Linux-x86_64-3.0.2/share/lobSTR/models/illumina_v2.0.3
--index-prefix genome_index/lobSTR_ --out genome_ncc
--filter-mapq0 --realign --max-repeats-in-ends 3 --min-read-end-match 10 >allelotype_ncc.out 2> allelotype_ncc.err

olekto avatar Nov 19 '15 07:11 olekto