TreeToReads icon indicating copy to clipboard operation
TreeToReads copied to clipboard

output VCF of SNPs

Open snacktavish opened this issue 9 years ago • 2 comments

snacktavish avatar Sep 08 '16 20:09 snacktavish

There's some oddness going on with the sim.vcf file that TTR outputs:

Each row after the header rows seems to be broken by an extra line break between the ALT field and the QUAL field. I'll post a fix if I can find where this is happening.

e.g.

##fileformat=VCFv4.0
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  20XX-06299      20XX-06660      20XX-08288      20XX-15107      20XX-15253      20XX-06769    20XX-14784      Reference       20XX-08336      20XX-10255      20XX-08874      20XX-09721      20XX-12844      20XX-15225      20XX-11321      20XX-06179
NC_016856.1 random 10Kb of NC_016856.1 chromosome       1192    .       A       C
                        40      PASS    .       GT      0       0       0       0       0       0       1       0       0       0       0       0       0    00       0
NC_016856.1 random 10Kb of NC_016856.1 chromosome       3094    .       C       T
                        40      PASS    .       GT      0       0       0       0       0       0       0       0       1       1       1       0       0    00       0

willpitchers avatar Oct 03 '18 07:10 willpitchers

Found it!

lines 915:916 in treetoreads.py:

fi.write('''{chrm}\t{loc}\t.\t{refbase}\t{altbase}
                 \t40\tPASS\t.\tGT\t{vars}\n'''.format(chrm=contig_name,

...should be on a single line like this:

fi.write('''{chrm}\t{loc}\t.\t{refbase}\t{altbase}\t40\tPASS\t.\tGT\t{vars}\n'''.format(chrm=contig_name,

This renders the sim.vcf output into a valid ..vcf-shape that plays nice with e.g. vcftools.

willpitchers avatar Oct 03 '18 07:10 willpitchers