edit sequence error
Cam_Hsc_genes_v1_UTRs-Hsc_scaff001-1..6046013.gff3.gz
From GenSas / Jodi Huffman
A GenSAS user reported this, and I am pretty sure it’s an Apollo error, but I just don’t know how to explain it.
Our user noticed that when he used the “Get Sequence” option on his gene model in the User-created Annnotations track, that a base was missing in the middle of the exon.
The base in question is highlighted in this screen shot:
When I use the “Get Sequence” function of Apollo, the “A” (reverse strand) is missing in the cdna and genomic sequence (end of blue text)
>65ce83c2-4679-412f-aa07-2693cade723a (sequence:exon) 179 residues [Hsc_scaff001:1660380-1660559 - strand] [cdna]
CCTTGCCATGTGCTACAAAAGTGTTTTGTCCGGCGGAGGGAATTTTCCGTGGAAAGTGTCGCGTCGCTTTCGCTTCTTTTCATATTTGTTGGATAAAAGCGGAAAATGTGGCAAAAATGGCAAAAAAGTTCGTTTGCGATGAATGGCACGCAATTTTCCGTGGGAAGAGAATGCTTCAA
>65ce83c2-4679-412f-aa07-2693cade723a (sequence:exon) 179 residues [Hsc_scaff001:1660380-1660559 - strand] [genomic]
CCTTGCCATGTGCTACAAAAGTGTTTTGTCCGGCGGAGGGAATTTTCCGTGGAAAGTGTCGCGTCGCTTTCGCTTCTTTTCATATTTGTTGGATAAAAGCGGAAAATGTGGCAAAAATGGCAAAAAAGTTCGTTTGCGATGAATGGCACGCAATTTTCCGTGGGAAGAGAATGCTTCAA
But when I look at the sequence of the gene model that was dragged to the User-created Annotations track, the missing base is there (in red):
>Hsc_scaff001 Hsc_scaff001:1657217..1660839 (- strand) class=gene length=3623 (I removed the extra sequence, so coordinates are different)
CCTTGCCATGTGCTACAAAAGTGTTTTGTCCGGCGGAGGGAATTTTCCGTGGAAAGTGTCGCGTCGCTTTCGCTTCTTTTCATATTTGTTGGATAAAAGCGGAAAATGTGGCAAAAATGGCAAAAAAGTTCGTATTGCGATGAATGGCACGCAATTTTCCGTGGGAAGAGAATGCTTCAA
I think it’s really odd that a base is missing in the middle of the gene. I could see how this would happen at the ends, but not the middle. Since GenSAS uses coordinates from the user-created data, it should be fine for downstream stuff, but this really is bad for curators who use “get sequence” to get the protein/gene sequence for quick Blast searches and what not.
I have never noticed this before, but I haven’t done a ton of manual curation where I might notice it either. Just wanted to pass the observation along.
Hi, i'm a new user of Apollo, and i have 2 very close issues described here on the same gene model. 1st: when i right click on the exon to get de CDS sequence a G is inserted (red arrow) and this lead to a shift in the frame reading and so in the amino acids traduction and splice sites analysis (black arrow)

2nd: when i right click on the exon to get de CDS sequence a T is deleted (red arrow) and then same problems with traduction, splice sites (black arrow) etc...

As you can imagine, it's a nightmare to get the right amino acids sequence.
Hope this can help ...
Can you verify the version of Apollo you are using?
I'm not seeing it using the 2.6.1 version of Apollo, but its possible there is some additional things I might need to do or I'm missing something:

Are you able to reproduce on the demo instance (I would recommend using Honeybee), which is running 2.6.1?
https://genomearchitect.readthedocs.io/en/latest/Demo.html
i'm using Apollo avaible in GenSAS, the version may be "old":
Apollo Genome Annotator
Version: 2.0.7-snapshot
Grails version: 2.5.5
Groovy version: 2.4.4
JVM version: 1.8.0_252
I'm not sure to be able to reproduce on the demo instance, i'll have a look.
@Jolivares-INRAE send me an email if you want admin access . . the nathandunn at lbl.gov . . in order for you to upload genomes. However, the case should be the same. Use the honeybee organism.
@Jolivares-INRAE also, if you get the FASTA / GFF3 from that organism I can just reproduce it locally to see if its already been fixed:

Upload it here or email me a link nathandunn @ lbl.gov
