funannotate icon indicating copy to clipboard operation
funannotate copied to clipboard

URL update needed for the latest MEROPs database file

Open calizilla opened this issue 1 year ago • 5 comments

Are you using the latest release? v 1.8.17

Describe the bug Funannotate pulls down this MEROPs db file which has 5009 genes and was last updated in 2019: https://ftp.ebi.ac.uk/pub/databases/merops/current_release/merops_scan.lib

The latest file (updated 2023) contains 5098 genes and has URL: https://ftp.ebi.ac.uk/pub/databases/merops/current_release/meropsscan.lib

Simple change to line 141 of script funannotate/setupDB.py from:

fasta = os.path.join(FUNDB, 'merops_scan.lib')

to:

fasta = os.path.join(FUNDB, 'meropsscan.lib')

and change line 199 of funannotate/resources.py from:

"merops": "https://ftp.ebi.ac.uk/pub/databases/merops/current_release/merops_scan.lib",

to:

"merops": "https://ftp.ebi.ac.uk/pub/databases/merops/current_release/meropsscan.lib",

will resolve the issue.

calizilla avatar Jul 11 '24 01:07 calizilla

I'll make this change but it ultimately looks like a bug/problem with MEROPS release to not use the same file name in the latest release? did you also inform them of this issue - seems like this will bite a lot of people who assume the filename structure would stay same between releases

hyphaltip avatar Jul 15 '24 15:07 hyphaltip

@hyphaltip thanks for the fix; and fair point - I just emailed [email protected] to advise of the issue and suggested they maintain copies at both filenames

calizilla avatar Jul 17 '24 11:07 calizilla

I pushed the new version as the default and it required a manual change to the code as the version is hardcoded in the code @nextgenusfs ? we can fix this in funannotate2 - though wish EBI would provide version number as a parseable option in their repository.

hyphaltip avatar Aug 12 '24 21:08 hyphaltip

@hyphaltip thanks. I still have not yeard back from EBI regarding the issue on their end.

Just wondering why funannotate chooses to use the meropscan.lib database rather than the pepunit.lib? I have now re-annotated my genomes (one fungus and one plant) using pepunit.lib and obtained far more MEROPs hits against pepunit (see below table). This is the number of unique MEROPs annotated genes, not the total number of hits to the respective database.

meropscan.lib pepunit.lib
plant 1180 2676
fungus 492 1804

calizilla avatar Aug 14 '24 00:08 calizilla

that's a jon @nextgenusfs question - he implemented this.

In my own work, if I am doing comparative genomics I end up running my suite of protein domain profiling from the predicted proteins rather than really worrying about the annotation that is part of the final genbank record as I would likely want to run this for the most up-to-date version of DBs. So its good you can get your own results for the DB you want rather than necessarily depending on funannotate for that as these are just added annotations in genbank files.

I'm not familiar with the nuance of these MEROPs files anyways so if you have an explanation of what each provide maybe there is a better one for the general goals of the toolkit here.

hyphaltip avatar Aug 19 '24 16:08 hyphaltip