MMseqs2 icon indicating copy to clipboard operation
MMseqs2 copied to clipboard

Add createdb support for sequence db input

Open matchy233 opened this issue 3 years ago • 3 comments

According to the source code of createdb, it should be able to accept MMSeqs databases as one of the input sources. But the current implementation fail to handle MMSeqs db input. This PR fixes this issue.

Probably need to edit MMSeqsBase.cpp with new instructions.

matchy233 avatar Mar 02 '22 08:03 matchy233

I can modify MMseqsBase.cpp to add instructions for the fixed version of createdb later if needed

matchy233 avatar Mar 16 '22 15:03 matchy233

This feature was meant for turning a bunch of fasta files in form of a DB (e.g., produced by tar2db) into a normal MMseqs2 sequence databases. It is being used for this purpose in the databases downloader workflow.

If you want to consume sequence dbs and produce new sequence dbs, i would suggest to add a check for the presence of a header db and only then do your new code.

milot-mirdita avatar Mar 16 '22 16:03 milot-mirdita

Thanks for the explanation! I'll modify the code to support the old implementation as well as the new one. Maybe we can add the usage for database input to the usage text? so that some curious users (like me) would not get confused next time :P

matchy233 avatar Mar 16 '22 16:03 matchy233