Running the run_calls wrapper with different numbers of samples produces different SNPs
Hi Zam,
I have three bacterial samples that I want to use to look for SNPs against an assembled reference. To do this, I have tried two methods using the run_calls wrapper. The first was to run each sample against the reference individually. The second, was to run all three samples together against the reference.
The command I used was:
/path/to/CORTEX_release_v1.0.5.17/scripts/calling/run_calls.pl --first_kmer 31 --fastaq_index /path/to/index --auto_cleaning yes --bc yes --pd no --outdir output --outvcf outputvcf --ploidy 1 --stampy_hash stampy_refhash --stampy_bin /path/to/stampy-1.0.31/stampy.py --list_ref_fasta list_ref_fasta --refbindir /path/to/working_directory --genome_size 2800000 --qthresh 5 --mem_height 17 --mem_width 1000 --vcftools_dir /path/to/vcftools_0.1.11 --do_union yes --ref CoordinatesOnly --workflow joint --logfile logfile_output log_1.txt
When I compared the two methods using the 'wk_flow_J_RefCO_FINALcombined_BC_calls_at_all_k.decomp.vcf' I was expecting to see the same SNPs/positions called for each sample, regardless of the method used, however this was not the case. The only difference between the two methods was the index files used:
For running samples individually (e.g. for sample 1):
cat index sample1 . /path /to/sample1_R1 /path /to/sample1_R2
For running samples together:
cat index sample1 . /path /to/sample1_R1 /path /to/sample1_R2 sample2 . /path /to/sample2_R1 /path /to/sample2_R2 sample3 . /path /to/sample3_R1 /path /to/sample3_R2
Am I doing something wrong?
Thanks
When you pass in just one sample, and use the joint workflow with CoordinatesOnly, you are asking it to find bubbles in the graph of just one sample. So by definition all you can spot are paralogs, or heterozygous sites.
If you want to catch times when the sample differs from the reference, use
--ref CoordinatesAndInCalling
Check the figure here: https://www.ncbi.nlm.nih.gov/pubmed/23172865
Am I making sense? Have I missed something?