TreeToReads
TreeToReads copied to clipboard
output VCF of SNPs
There's some oddness going on with the sim.vcf file that TTR outputs:
Each row after the header rows seems to be broken by an extra line break between the ALT field and the QUAL field. I'll post a fix if I can find where this is happening.
e.g.
##fileformat=VCFv4.0
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 20XX-06299 20XX-06660 20XX-08288 20XX-15107 20XX-15253 20XX-06769 20XX-14784 Reference 20XX-08336 20XX-10255 20XX-08874 20XX-09721 20XX-12844 20XX-15225 20XX-11321 20XX-06179
NC_016856.1 random 10Kb of NC_016856.1 chromosome 1192 . A C
40 PASS . GT 0 0 0 0 0 0 1 0 0 0 0 0 0 00 0
NC_016856.1 random 10Kb of NC_016856.1 chromosome 3094 . C T
40 PASS . GT 0 0 0 0 0 0 0 0 1 1 1 0 0 00 0
Found it!
lines 915:916 in treetoreads.py:
fi.write('''{chrm}\t{loc}\t.\t{refbase}\t{altbase}
\t40\tPASS\t.\tGT\t{vars}\n'''.format(chrm=contig_name,
...should be on a single line like this:
fi.write('''{chrm}\t{loc}\t.\t{refbase}\t{altbase}\t40\tPASS\t.\tGT\t{vars}\n'''.format(chrm=contig_name,
This renders the sim.vcf output into a valid ..vcf-shape that plays nice with e.g. vcftools.