MMseqs2 Convert the Colabfold DB Failed

Expected Behavior

Convert the UniRef30 DB and the Colabfold DB success. I want to search a FASTA file using colabsearch.sh

Current Behavior

when I build database using command in https://colabfold.mmseqs.com/ but meet the error. The MMseqs died without any hints.

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders. mmseqs createindex colabfold_envdb_202108_db tmp

MMseqs Output (for bugs)

Please make sure to also post the complete output of MMseqs. You can use gist.github.com for large output.

createindex colabfold_envdb_202108_db tmp

MMseqs Version:                 75af0c82edf34587548bacc865cfa1d2261a9696
Seed substitution matrix        aa:VTML80.out,nucl:nucleotide.out
k-mer length                    0
Alphabet size                   aa:21,nucl:5
Compositional bias              1
Max sequence length             65535
Max results per query           300
Mask residues                   1
Mask residues probability       0.9
Mask lower case residues        0
Spaced k-mers                   1
Spaced k-mer pattern
Sensitivity                     7.5
k-score                         seq:0,prof:0
Check compatible                0
Search type                     0
Split database                  0
Split memory limit              0
Verbosity                       3
Threads                         32
Min codons in orf               30
Max codons in length            32734
Max orf gaps                    2147483647
Contig start mode               2
Contig end mode                 2
Orf start mode                  1
Forward frames                  1,2,3
Reverse frames                  1,2,3
Translation table               1
Translate orf                   0
Use all table starts            false
Offset of numeric ids           0
Create lookup                   0
Compressed                      0
Add orf stop                    false
Overlap between sequences       0
Sequence split mode             1
Header split mode               0
Strand selection                1
Remove temporary files          false

createindex colabfold_envdb_202108_db tmp

MMseqs Version:                 75af0c82edf34587548bacc865cfa1d2261a9696
Seed substitution matrix        aa:VTML80.out,nucl:nucleotide.out
k-mer length                    0
Alphabet size                   aa:21,nucl:5
Compositional bias              1
Max sequence length             65535
Max results per query           300
Mask residues                   1
Mask residues probability       0.9
Mask lower case residues        0
Spaced k-mers                   1
Spaced k-mer pattern
Sensitivity                     7.5
k-score                         seq:0,prof:0
Check compatible                0
Search type                     0
Split database                  0
Split memory limit              0
Verbosity                       3
Threads                         32
Min codons in orf               30
Max codons in length            32734
Max orf gaps                    2147483647
Contig start mode               2
Contig end mode                 2
Orf start mode                  1
Forward frames                  1,2,3
Reverse frames                  1,2,3
Translation table               1
Translate orf                   0
Use all table starts            false
Offset of numeric ids           0
Create lookup                   0
Compressed                      0
Add orf stop                    false
Overlap between sequences       0
Sequence split mode             1
Header split mode               0
Strand selection                1
Remove temporary files          false

indexdb colabfold_envdb_202108_db colabfold_envdb_202108_db --seed-sub-mat aa:VTML80.out,nucl:nucleotide.out -k 0 --alph-size aa:21,nucl:5 --comp-bias-corr 1 --max-seq-len 65535 --max-seqs 300 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --spaced-kmer-mode 1 -s 7.5 --k-score seq:0,prof:0 --check-compatible 0 --search-type 0 --split 0 --split-memory-limit 0 -v 3 --threads 32

Target split mode. Searching through 9 splits
Estimated memory consumption: 79G
Write VERSION (0)
Write META (1)
Write SCOREMATRIX3MER (4)
Write SCOREMATRIX2MER (3)
Write SCOREMATRIXNAME (2)
Write SPACEDPATTERN (23)
Write GENERATOR (22)
Write DBR1INDEX (5)
Write DBR1DATA (6)
Write DBR2INDEX (7)
Write DBR2DATA (8)
Write HDR1INDEX (18)
Write HDR1DATA (19)
Write ALNINDEX (24)
Write ALNDATA (25)
Index table: counting k-mers
[=================================================================] 100.00% 23.11M 27m 25s 83ms
Index table: Masked residues: 122354587
Index table: fill
tmp/17913398511991990568/createindex.sh: line 56: 114573 Killed                  "$MMSEQS" $INDEXER "$INPUT" "$INPUT" ${INDEX_PAR}
Error: indexdb died

Context

Providing context helps us come up with a solution and improve our documentation for the future.

Your Environment

Include as many relevant details about the environment you experienced the bug in.

Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters):75af0c82edf34587548bacc865cfa1d2261a9696
Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.):self-compiled
For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation:cmake 2.8.10
Server specifications (especially CPU support for AVX2/SSE and amount of system memory):CPU support for AVX2/SSE and memory is 100GB
Operating system and version:linux 3.10.0 X86_64

Feb 24 '22 04:02 hellofinch

It seems like the job got killed by the operation system. Could it be possible that somebody else used main memory at the same time?

Feb 27 '22 16:02 martin-steinegger

@martin-steinegger I use mmseqs in a slurm cluster. I salloc a node and get into it to do what I do. So it can't be used by somebody else.

Feb 28 '22 01:02 hellofinch

Did you try increasing your compute?

Estimated memory consumption: 79G with total memory 100GB should ideally work but I think there might be some portion of RAM being devoted to other tasks and that might make the OS terminate it.

Apr 29 '22 05:04 Meghpal