Error: createbintaxonomy failed with malloc error
Expected Behavior
Taxonomy database created based on a seqdb created from UniProt sequences
Current Behavior
Program crashed with core dumped error and reports Error: createbintaxonomy failed.
Steps to Reproduce (for bugs)
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
mmseqs createdb "uniprot_2024_03.fasta" seqdb
then
mmseqs createtaxdb seqdb tmp
We attempted to vary --tax-db-mode, --tax-mapping-mode, and --threads parameters but observed the same behavior. Any help would be highly appreciated.
We are able to reproduce this issue with a minimal database containing 1000 sequences.
MMseqs Output (for bugs)
> mmseqs createtaxdb seqdb tmp
createtaxdb seqdb tmp
MMseqs Version: 15.6f452
NCBI tax dump directory
Taxonomy mapping file
Taxonomy mapping mode 0
Taxonomy db mode 1
Threads 48
Verbosity 3
Loading nodes file ... Done, got 2601214 nodes
Loading merged file ... Done, added 79743 merged nodes.
Loading names file ... Done
mmseqs: malloc.c:2379: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.
Aborted (core dumped)
Error: createbintaxonomy failed
Context
We are trying to create a custom taxonomy database for MSA, such that the resulting .a3m files contain taxonomy information.
Is a taxonomy database already available for download for uniprot_2024_03 for similar releases?
Your Environment
Include as many relevant details about the environment you experienced the bug in.
- Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters):
- Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.):
- For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation:
- Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
- Operating system and version:
Linux 64-bit, 256G memory MMseqs Version: 15.6f452
Having the same issue. Any insights on how to address this, please?
I'm also was getting this error earlier...and interestingly only on Linux (seemed to work fine on MacOS for me).
Strangely, it now seems to be working again for me since about a 10 minutes ago, despite having the exact same setup as I had when I was getting the error.
MMseqs2 Version: 45111b641859ed0ddd875b94d6fd1aef1a675b7e
Does this happen with the databases download of the uniprot or only if you call createtaxdb manually?
databases goes through a separate branch to extract taxonomic information from uniprot based databases and should not be affected.
Does this happen with the
databasesdownload of the uniprot or only if you callcreatetaxdbmanually?
databasesgoes through a separate branch to extract taxonomic information from uniprot based databases and should not be affected.
Thanks for supporting us! This happens when calling createtaxdb manually.
Would you please elaborate on what databases download entails in this context?
To zoom out a bit: is there a feasible mmseqs2 command to generate .a3m files with correct UniRef100 taxonomy identifiers without going through this custom database setup procedure?
mmseqs databases UniProtKB uniprot tmp
should download the latest uniprot and set it up correctly for use with MMseqs2 including taxonomy information.
mmseqs databases UniProtKB uniprot tmpshould download the latest uniprot and set it up correctly for use with MMseqs2 including taxonomy information.
Thanks for this - we ran this command and obtained main database files containing uniprot_h, uniprot.index etc. However we probably still need some help to understand the next steps to assign taxonomy IDs to alignments. Following https://github.com/sokrypton/ColabFold/issues/216, here is what we tried:
mmseqs convertalis test/qdb uniprot test/res_exp test/res_exp_realign.m8 --format-output query,target,taxid,taxname,taxlineage,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,cigar
and it raised the following error:
Loading NCBI taxonomy
names.dmp, nodes.dmp, merged.dmp from NCBI taxdump could not be found!
Is there something that we are missing here? Thanks!