Vcfv4.2 vcf reader parser complaints while parsing through vcf file
I have 40 vcf files, which I need to convert into a tabular format. But the vcf.reader parser complaints while I am parsing it through each row. The same script was used in other vcf files for example on vcfv4.1 and it successfully converted into a tabular format. But on current version its throwing error on the following lines,
for variant in Va_BM:
tumor_REL = variant.samples[0]
normal_ID = variant.samples[1]
Error as,
ValueError Traceback (most recent call last)
/home/usr/Tools/anaconda3/lib/python3.4/site-packages/vcf/parser.py in _parse_samples(self, samples, samp_fmt, site)
464 try:
--> 465 sampdat[i] = int(vals)
466 except ValueError:
ValueError: invalid literal for int() with base 10: '26,15'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-216-6065142d89c7> in <module>()
9
10 # iterating lines in VCF file (one line = one variant)
---> 11 for variant in Va_BM:
12 tumor_REL = variant.samples[0]
13 normal_ID = variant.samples[1]
/home/usr/Tools/anaconda3/lib/python3.4/site-packages/vcf/parser.py in __next__(self)
565
566 if fmt is not None:
--> 567 samples = self._parse_samples(row[9:], fmt, record)
568 record.samples = samples
569
/home/usr/Tools/anaconda3/lib/python3.4/site-packages/vcf/parser.py in _parse_samples(self, samples, samp_fmt, site)
465 sampdat[i] = int(vals)
466 except ValueError:
--> 467 sampdat[i] = float(vals)
468 elif entry_type == 'Float':
469 sampdat[i] = float(vals)
ValueError: could not convert string to float: '26,15'
I had similar issue, not sure if the exact one. You can access the name of samples right after reading the vcf file, rather then when trying to run the for loop for each vcf line.
SpHomAlleles = vcf.Reader(open('Sp.AllHomVarSameGT.vcf', 'r'))
sample_name = SpHomAlleles.samples
# this will give you the list of sample names
# if you want to acces the first sample name it has to be done before running for-loop
sample_name = SpHomAlleles.samples[0]
# then
for record in SpHomAlleles:
chrom = record.CHROM
pos = record.POS
# then another next for loop to get through another each sample in each vcf record/line
for sample in record.samples:
sample_id = record.genotype(sample.sample) # if you want to access each sample in for-loop
sample_id = record.genotype(sample_name) # just the specific sample which was accessed earlier.
gt = sample_id['GT']
Not, sure if you have the same issues or different one. Hope it helps
facing the same issue. I want to iterate through the variants but as soon as the iterator is called, it fails on the POS column of the vcf Actual error:
ValueError: invalid literal for int() with base 10: ''