Non-zero error rates for perfect reads
Hallo dear NanoSim-Team,
I simulated 1,000,00 perfect reads and used them to train an error-model. One would expect to find a zero error-rate, however, the model reports an error-rate of ~1.25%. I have looked at some mappings in the IGV but could not see any errors there. Do you have an explanation for this behaviour?
Best, simon
the model reports an error-rate of ~1.25%.
Is this value output by NS, or calculated by you after sequence alignment to your reference?
Also, are you generating perfect reads with head/tail sequences? If so, it could help explain this.
Hi,
sorry for my late reply.
Read simulation:
simulator.py transcriptome -rt transcriptome.fasta -e chr1_abundance.tsv -c model/training -n 1000000 --no_model_ir -t 128 -o perfect --perfect
Model creation:
read_analysis.py transcriptome -rt transcriptome.fasta -rg genome.fasta -i perfect_aligned_reads.fasta -o model_perfect/training -t 128 --no_intron_retention
I loaded the genome.sorted.bam to IGV to inspect if the mappings have any missmatches or indels but could not find any. But the error rates from the training_error_rate.tsv are not zero:
Mismatch rate: 0.0038233864689006677 Insertion rate: 0.007720972287073042 Deletion rate: 0.0009709514847554787 Total error rate: 0.012515310240729188