PyVCF icon indicating copy to clipboard operation
PyVCF copied to clipboard

Vcfv4.2 vcf reader parser complaints while parsing through vcf file

Open ranijames opened this issue 8 years ago • 2 comments

I have 40 vcf files, which I need to convert into a tabular format. But the vcf.reader parser complaints while I am parsing it through each row. The same script was used in other vcf files for example on vcfv4.1 and it successfully converted into a tabular format. But on current version its throwing error on the following lines,

    for variant in Va_BM:
        tumor_REL       = variant.samples[0]
        normal_ID       = variant.samples[1]
Error as,
ValueError                                Traceback (most recent call last)
/home/usr/Tools/anaconda3/lib/python3.4/site-packages/vcf/parser.py in _parse_samples(self, samples, samp_fmt, site)
    464                         try:
--> 465                             sampdat[i] = int(vals)
    466                         except ValueError:

ValueError: invalid literal for int() with base 10: '26,15'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-216-6065142d89c7> in <module>()
      9 
     10     #  iterating lines in VCF file (one line = one variant)
---> 11     for variant in Va_BM:
     12         tumor_REL       = variant.samples[0]
     13         normal_ID       = variant.samples[1]

/home/usr/Tools/anaconda3/lib/python3.4/site-packages/vcf/parser.py in __next__(self)
    565 
    566         if fmt is not None:
--> 567             samples = self._parse_samples(row[9:], fmt, record)
    568             record.samples = samples
    569 

/home/usr/Tools/anaconda3/lib/python3.4/site-packages/vcf/parser.py in _parse_samples(self, samples, samp_fmt, site)
    465                             sampdat[i] = int(vals)
    466                         except ValueError:
--> 467                             sampdat[i] = float(vals)
    468                     elif entry_type == 'Float':
    469                         sampdat[i] = float(vals)

ValueError: could not convert string to float: '26,15'

ranijames avatar May 15 '17 10:05 ranijames

I had similar issue, not sure if the exact one. You can access the name of samples right after reading the vcf file, rather then when trying to run the for loop for each vcf line.

SpHomAlleles = vcf.Reader(open('Sp.AllHomVarSameGT.vcf', 'r'))

sample_name = SpHomAlleles.samples
# this will give you the list of sample names
# if you want to acces the first sample name it has to be done before running for-loop
sample_name = SpHomAlleles.samples[0]

# then
for record in SpHomAlleles:
    chrom = record.CHROM
    pos = record.POS

    # then another next for loop to get through another each sample in each vcf record/line
    for sample in record.samples:
        sample_id = record.genotype(sample.sample) # if you want to access each sample in for-loop
        sample_id = record.genotype(sample_name) # just the specific sample which was accessed earlier.
        gt = sample_id['GT']

Not, sure if you have the same issues or different one. Hope it helps

everestial avatar Jun 05 '17 14:06 everestial

facing the same issue. I want to iterate through the variants but as soon as the iterator is called, it fails on the POS column of the vcf Actual error:

ValueError: invalid literal for int() with base 10: ''

erprateek avatar Jan 14 '19 22:01 erprateek