NanoSim Non-zero error rates for perfect reads

Hallo dear NanoSim-Team,

I simulated 1,000,00 perfect reads and used them to train an error-model. One would expect to find a zero error-rate, however, the model reports an error-rate of ~1.25%. I have looked at some mappings in the IGV but could not see any errors there. Do you have an explanation for this behaviour?

Best, simon

Mar 31 '25 09:03 SimonHegele

the model reports an error-rate of ~1.25%.

Is this value output by NS, or calculated by you after sequence alignment to your reference?

Also, are you generating perfect reads with head/tail sequences? If so, it could help explain this.

Mar 31 '25 17:03 warrenlr

Hi,

sorry for my late reply.

Read simulation: simulator.py transcriptome -rt transcriptome.fasta -e chr1_abundance.tsv -c model/training -n 1000000 --no_model_ir -t 128 -o perfect --perfect Model creation: read_analysis.py transcriptome -rt transcriptome.fasta -rg genome.fasta -i perfect_aligned_reads.fasta -o model_perfect/training -t 128 --no_intron_retention

I loaded the genome.sorted.bam to IGV to inspect if the mappings have any missmatches or indels but could not find any. But the error rates from the training_error_rate.tsv are not zero:

Mismatch rate: 0.0038233864689006677 Insertion rate: 0.007720972287073042 Deletion rate: 0.0009709514847554787 Total error rate: 0.012515310240729188

Apr 03 '25 08:04 SimonHegele