MMseqs2 icon indicating copy to clipboard operation
MMseqs2 copied to clipboard

Error: createbintaxonomy failed with malloc error

Open zrqiao opened this issue 1 year ago • 6 comments

Expected Behavior

Taxonomy database created based on a seqdb created from UniProt sequences

Current Behavior

Program crashed with core dumped error and reports Error: createbintaxonomy failed.

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.

mmseqs createdb "uniprot_2024_03.fasta" seqdb

then

mmseqs createtaxdb seqdb tmp 

We attempted to vary --tax-db-mode, --tax-mapping-mode, and --threads parameters but observed the same behavior. Any help would be highly appreciated.

We are able to reproduce this issue with a minimal database containing 1000 sequences.

MMseqs Output (for bugs)

> mmseqs createtaxdb seqdb tmp 
createtaxdb seqdb tmp 

MMseqs Version:         15.6f452
NCBI tax dump directory
Taxonomy mapping file  
Taxonomy mapping mode   0
Taxonomy db mode        1
Threads                 48
Verbosity               3

Loading nodes file ... Done, got 2601214 nodes
Loading merged file ... Done, added 79743 merged nodes.
Loading names file ... Done
mmseqs: malloc.c:2379: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.
Aborted (core dumped)
Error: createbintaxonomy failed

Context

We are trying to create a custom taxonomy database for MSA, such that the resulting .a3m files contain taxonomy information.

Is a taxonomy database already available for download for uniprot_2024_03 for similar releases?

Your Environment

Include as many relevant details about the environment you experienced the bug in.

  • Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters):
  • Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.):
  • For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation:
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
  • Operating system and version:

Linux 64-bit, 256G memory MMseqs Version: 15.6f452

zrqiao avatar Aug 11 '24 00:08 zrqiao

Having the same issue. Any insights on how to address this, please?

ahof1704 avatar Aug 12 '24 13:08 ahof1704

I'm also was getting this error earlier...and interestingly only on Linux (seemed to work fine on MacOS for me).

Strangely, it now seems to be working again for me since about a 10 minutes ago, despite having the exact same setup as I had when I was getting the error.

MMseqs2 Version: 45111b641859ed0ddd875b94d6fd1aef1a675b7e

piehld avatar Aug 12 '24 20:08 piehld

Does this happen with the databases download of the uniprot or only if you call createtaxdb manually?

databases goes through a separate branch to extract taxonomic information from uniprot based databases and should not be affected.

milot-mirdita avatar Aug 13 '24 05:08 milot-mirdita

Does this happen with the databases download of the uniprot or only if you call createtaxdb manually?

databases goes through a separate branch to extract taxonomic information from uniprot based databases and should not be affected.

Thanks for supporting us! This happens when calling createtaxdb manually.

Would you please elaborate on what databases download entails in this context?

To zoom out a bit: is there a feasible mmseqs2 command to generate .a3m files with correct UniRef100 taxonomy identifiers without going through this custom database setup procedure?

zrqiao avatar Aug 13 '24 05:08 zrqiao

mmseqs databases UniProtKB uniprot tmp

should download the latest uniprot and set it up correctly for use with MMseqs2 including taxonomy information.

milot-mirdita avatar Aug 13 '24 06:08 milot-mirdita

mmseqs databases UniProtKB uniprot tmp

should download the latest uniprot and set it up correctly for use with MMseqs2 including taxonomy information.

Thanks for this - we ran this command and obtained main database files containing uniprot_h, uniprot.index etc. However we probably still need some help to understand the next steps to assign taxonomy IDs to alignments. Following https://github.com/sokrypton/ColabFold/issues/216, here is what we tried:

mmseqs convertalis test/qdb uniprot test/res_exp test/res_exp_realign.m8 --format-output query,target,taxid,taxname,taxlineage,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,cigar

and it raised the following error:

Loading NCBI taxonomy
names.dmp, nodes.dmp, merged.dmp from NCBI taxdump could not be found!

Is there something that we are missing here? Thanks!

zrqiao avatar Aug 16 '24 09:08 zrqiao