GToTree icon indicating copy to clipboard operation
GToTree copied to clipboard

gtt-gen-SCG-HMMs: Downloading the PFam HMM information file failed

Open naturepoker opened this issue 10 months ago • 3 comments

I'm trying to build a custom SCG set using GToTree's built in SCG screening tool (which usually works wonderfully by the way).

Right now I'm getting "Downloading the PFam HMM information file failed" error (intended PFam version is 37.2). And I just wanted to ask if this is an issue that could be replicated by someone else here, as a sanity check.

I also checked the tmp-dir created during the process, and it contains 1.8gb all-pfams.hmm file in there, so some sort of PFam download has taken place. Curious what the missing file might be in this case.

Thank you!

naturepoker avatar Mar 17 '25 16:03 naturepoker

Hey there, @naturepoker!

Thanks for the heads-up about this. It seems the last 2 pfam releases (37.2 and 37.1) don't have the same structure, and they are missing that info file. Previously it was in database_files/pfamA.txt.gz, e.g, here for 37.0, but it doesn't look present in the newer ones. This is a bummer as that file is integral for how gtotree currently does this because info in that file is used to retain only those PFams that span more than 50% of the underlying proteins...

I'm gonna have to look into this and see if i can find similar info somewhere else, and maybe provide an option to specify the desired version of pfam to use (which would be helpful anyway).

I'll try to get this resolved ASAP, but if you want to try something i can't test at the moment just so you can move forward as usual with version 37.0, you can try changing line 524 of gtt-gen-SCG-HMMs to return "37.0", which for now might be enough to let it just use 37.0. Otherwise i'll get back to this issue thread asap

Thanks again for posting!

AstrobioMike avatar Mar 17 '25 17:03 AstrobioMike

Ah that's good to know, shame about Pfam changing their database around but I'm happy I could help a bit.

For those reading this, workaround is as @AstrobioMike described. Just find the location of your gtt-gen-GCF-HMMs file via:

whereis gtt-gen-GCF-HMMs

Make a backup copy (just in case) and add "37.0" right on the same line as 'return' (line 524), which will override output of function get_latest_pfam_version

I just tested it out and the script built a new SCG set without any issues.

I'll close this one but please feel free to open it up to add comments/updates!

naturepoker avatar Mar 17 '25 18:03 naturepoker

Great, thanks for testing and reporting back that that will let people skate around the issue for the moment.

I'm going to keep this open until i put in a legit fix (even if it means relegating gtt-gen-SCG-HMMs to PFam 37.0 only).

For my own note, I've put in a question at the Interpro API (https://github.com/ProteinsWebTeam/interpro7-api/issues/175) as one step towards trying to figure out if i can still get that info somewhere

Thanks again!

AstrobioMike avatar Mar 17 '25 19:03 AstrobioMike