ORF prediction with non-standard genetic code sometimes are wrong.
I'm using the latest mfannot via docker image.
I'm annotating a fungal mt genome with genetic code 4, and some of the ORFs were problematic. I have cross-validated with ncbi ORFfinder, mostly they are ORFs with non-ATG start codon.
e.q.
42090 TAAATTATGATTGTTGGGGTAAATATTATAAAATATCCGCTTATTCATTTGGTATTAA
; G-orf782 ==> start ;; contain dpo
42148 AACATTACTTTTAACAAAAATATTGCTTTTATTCAAGTGGATAAAGGAAATGATAAAAAT
...
44476 AAACCTTTAAAC
;; G-dpo_2 ==> end
44488 TTTATTGTT
; G-orf782 ==> end
44497 TAACCCTTTGGCTTTACACTACTTTGTCTTATCTTTTTAGTTCGGCTAATCTTAGTGGCA
The start codon should be ATT (42151 bp) and stop codon should be TAA (44497 bp). The protein sequence in .sqn file is ncbieaa "-ITFNKNIAFI....
56519 TTGGGGGAGTTAACAAATAATAAAATAATAAAATAATAAAAT
; G-orf510 ==> start /note=LAGLIDADG ;; evalue:1.6e-28
56561 AATAAAATAATAAATACAAATATGAGAATCTTAATACAAATATTTACAAATTCTGACTTA
...
58061 ATAAAATCTAACATGAATATGAATAGAAGTTAA
; G-orf510 ==> end
58094 TATAATTTCATATGGTTTGCTAGTTAACCCCGTTCAAAATCAGACCAACTACTAATACAA
AAT is not a valid start codon and it should be ATA (56567 bp). The protein sequence in .sqn is ncbieaa "-KIINTNMRI....
I have got an error message:
...
7) Annotate genes with introns...
Use of uninitialized value $pb in string eq at /usr/local/bin/mfannot line 1610.
8) Identify gene fusions...
...
It seems to be dealing with frame-shift annotations, I don't known if this is related to the problem.
@cgjosephlee, thanks to reporting this issue, can I have the full sequence to work on this issue. It will be greatly appreciated.
Try this https://www.ncbi.nlm.nih.gov/nuccore/CM008263.1
Hi @cgjosephlee,
This branch issue_16 should fix this issue. At this point I only have a singularity container here containing the new code.
I have some other work to do before merging the code on the main branch, but if you want to test with the singularity version it will be appreciate.
Thanks for your comments and your help.