predicted proteins not starting with M ?
Hello ! I am using Metaeuk through BUSCO on my genome in order to do some gene prediction.
I expect that the single-copy, full-length detected proteins should have in their vast majority a methyonine at their start. However, this is not the case.
I went through the predicted BUSCOs and several of them started by another aminoacid. Is there a step in metaeuk that checks for the starting aminoacid ? I am using the glires database from ODB on a yet unannotated genome, with metaeuk Version 5.34c21f2 I installed BUSCO (so Metaeuk as well) using their conda installation (BUSCO V 5.3) on an Ubuntu operating system. Best, Timothee
Hi Timothee,
Thank you for the comment. I am marking this as a future feature to develop. Right now it is not possible to impose that proteins start with a methionine. There can be several reasons why several of your proteins do not start with M: (1) some proteins simply don't, (2) It can be your contigs are very fragmented so you get a lot of partial proteins; (3) It can be that your investigated organism is not very similar to that, which exists in the target database, in which case, the homology detection will be harder and some parts (potentially the start) of the proteins match poorly.
If this concerns you, I would try to look at a couple of things: (1) What is the fraction of proteins, which do not start with M? Does is it make sense for the taxonomic group you're investigating? How does this correlate with their E-value (do the missing M have worse E-values?) (2) Can you manually check a couple of examples? Does it look like there is an M upstream, which was not detected?