Apollo edit sequence error

Annotations.gff3.gz

Cam_Hsc_genes_v1_UTRs-Hsc_scaff001-1..6046013.gff3.gz

From GenSas / Jodi Huffman

A GenSAS user reported this, and I am pretty sure it’s an Apollo error, but I just don’t know how to explain it.

Our user noticed that when he used the “Get Sequence” option on his gene model in the User-created Annnotations track, that a base was missing in the middle of the exon.

The base in question is highlighted in this screen shot:

When I use the “Get Sequence” function of Apollo, the “A” (reverse strand) is missing in the cdna and genomic sequence (end of blue text)

>65ce83c2-4679-412f-aa07-2693cade723a (sequence:exon) 179 residues [Hsc_scaff001:1660380-1660559 - strand] [cdna]
CCTTGCCATGTGCTACAAAAGTGTTTTGTCCGGCGGAGGGAATTTTCCGTGGAAAGTGTCGCGTCGCTTTCGCTTCTTTTCATATTTGTTGGATAAAAGCGGAAAATGTGGCAAAAATGGCAAAAAAGTTCGTTTGCGATGAATGGCACGCAATTTTCCGTGGGAAGAGAATGCTTCAA

>65ce83c2-4679-412f-aa07-2693cade723a (sequence:exon) 179 residues [Hsc_scaff001:1660380-1660559 - strand] [genomic]
CCTTGCCATGTGCTACAAAAGTGTTTTGTCCGGCGGAGGGAATTTTCCGTGGAAAGTGTCGCGTCGCTTTCGCTTCTTTTCATATTTGTTGGATAAAAGCGGAAAATGTGGCAAAAATGGCAAAAAAGTTCGTTTGCGATGAATGGCACGCAATTTTCCGTGGGAAGAGAATGCTTCAA

But when I look at the sequence of the gene model that was dragged to the User-created Annotations track, the missing base is there (in red):

>Hsc_scaff001 Hsc_scaff001:1657217..1660839 (- strand) class=gene length=3623 (I removed the extra sequence, so coordinates are different)
CCTTGCCATGTGCTACAAAAGTGTTTTGTCCGGCGGAGGGAATTTTCCGTGGAAAGTGTCGCGTCGCTTTCGCTTCTTTTCATATTTGTTGGATAAAAGCGGAAAATGTGGCAAAAATGGCAAAAAAGTTCGTATTGCGATGAATGGCACGCAATTTTCCGTGGGAAGAGAATGCTTCAA

I think it’s really odd that a base is missing in the middle of the gene. I could see how this would happen at the ends, but not the middle. Since GenSAS uses coordinates from the user-created data, it should be fine for downstream stuff, but this really is bad for curators who use “get sequence” to get the protein/gene sequence for quick Blast searches and what not.

I have never noticed this before, but I haven’t done a ton of manual curation where I might notice it either. Just wanted to pass the observation along.

Apr 10 '19 20:04 nathandunn

Hi, i'm a new user of Apollo, and i have 2 very close issues described here on the same gene model. 1st: when i right click on the exon to get de CDS sequence a G is inserted (red arrow) and this lead to a shift in the frame reading and so in the amino acids traduction and splice sites analysis (black arrow)

Apollo1

2nd: when i right click on the exon to get de CDS sequence a T is deleted (red arrow) and then same problems with traduction, splice sites (black arrow) etc...

Apollo2

As you can imagine, it's a nightmare to get the right amino acids sequence.

Hope this can help ...

Sep 14 '20 07:09 Jolivares-INRAE

Can you verify the version of Apollo you are using?

I'm not seeing it using the 2.6.1 version of Apollo, but its possible there is some additional things I might need to do or I'm missing something:

Are you able to reproduce on the demo instance (I would recommend using Honeybee), which is running 2.6.1?

https://genomearchitect.readthedocs.io/en/latest/Demo.html

Sep 14 '20 18:09 nathandunn

i'm using Apollo avaible in GenSAS, the version may be "old":

Apollo Genome Annotator

Version: 2.0.7-snapshot
Grails version: 2.5.5
Groovy version: 2.4.4
JVM version: 1.8.0_252

I'm not sure to be able to reproduce on the demo instance, i'll have a look.

Sep 15 '20 06:09 Jolivares-INRAE

@Jolivares-INRAE send me an email if you want admin access . . the nathandunn at lbl.gov . . in order for you to upload genomes. However, the case should be the same. Use the honeybee organism.

Sep 15 '20 17:09 nathandunn

@Jolivares-INRAE also, if you get the FASTA / GFF3 from that organism I can just reproduce it locally to see if its already been fixed:

Upload it here or email me a link nathandunn @ lbl.gov

Apr 14 '21 20:04 nathandunn