NanoSim icon indicating copy to clipboard operation
NanoSim copied to clipboard

Non-zero error rates for perfect reads

Open SimonHegele opened this issue 10 months ago • 2 comments

Hallo dear NanoSim-Team,

I simulated 1,000,00 perfect reads and used them to train an error-model. One would expect to find a zero error-rate, however, the model reports an error-rate of ~1.25%. I have looked at some mappings in the IGV but could not see any errors there. Do you have an explanation for this behaviour?

Best, simon

SimonHegele avatar Mar 31 '25 09:03 SimonHegele

the model reports an error-rate of ~1.25%.

Is this value output by NS, or calculated by you after sequence alignment to your reference?

Also, are you generating perfect reads with head/tail sequences? If so, it could help explain this.

warrenlr avatar Mar 31 '25 17:03 warrenlr

Hi,

sorry for my late reply.

Read simulation: simulator.py transcriptome -rt transcriptome.fasta -e chr1_abundance.tsv -c model/training -n 1000000 --no_model_ir -t 128 -o perfect --perfect Model creation: read_analysis.py transcriptome -rt transcriptome.fasta -rg genome.fasta -i perfect_aligned_reads.fasta -o model_perfect/training -t 128 --no_intron_retention

I loaded the genome.sorted.bam to IGV to inspect if the mappings have any missmatches or indels but could not find any. But the error rates from the training_error_rate.tsv are not zero:

Mismatch rate: 0.0038233864689006677 Insertion rate: 0.007720972287073042 Deletion rate: 0.0009709514847554787 Total error rate: 0.012515310240729188

SimonHegele avatar Apr 03 '25 08:04 SimonHegele