Convert the Colabfold DB Failed
Expected Behavior
Convert the UniRef30 DB and the Colabfold DB success. I want to search a FASTA file using colabsearch.sh
Current Behavior
when I build database using command in https://colabfold.mmseqs.com/ but meet the error. The MMseqs died without any hints.
Steps to Reproduce (for bugs)
Please make sure to execute the reproduction steps with newly recreated and empty tmp folders. mmseqs createindex colabfold_envdb_202108_db tmp
MMseqs Output (for bugs)
Please make sure to also post the complete output of MMseqs. You can use gist.github.com for large output.
createindex colabfold_envdb_202108_db tmp
MMseqs Version: 75af0c82edf34587548bacc865cfa1d2261a9696
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
k-mer length 0
Alphabet size aa:21,nucl:5
Compositional bias 1
Max sequence length 65535
Max results per query 300
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Spaced k-mers 1
Spaced k-mer pattern
Sensitivity 7.5
k-score seq:0,prof:0
Check compatible 0
Search type 0
Split database 0
Split memory limit 0
Verbosity 3
Threads 32
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Compressed 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Strand selection 1
Remove temporary files false
createindex colabfold_envdb_202108_db tmp
MMseqs Version: 75af0c82edf34587548bacc865cfa1d2261a9696
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
k-mer length 0
Alphabet size aa:21,nucl:5
Compositional bias 1
Max sequence length 65535
Max results per query 300
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Spaced k-mers 1
Spaced k-mer pattern
Sensitivity 7.5
k-score seq:0,prof:0
Check compatible 0
Search type 0
Split database 0
Split memory limit 0
Verbosity 3
Threads 32
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Compressed 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Strand selection 1
Remove temporary files false
indexdb colabfold_envdb_202108_db colabfold_envdb_202108_db --seed-sub-mat aa:VTML80.out,nucl:nucleotide.out -k 0 --alph-size aa:21,nucl:5 --comp-bias-corr 1 --max-seq-len 65535 --max-seqs 300 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --spaced-kmer-mode 1 -s 7.5 --k-score seq:0,prof:0 --check-compatible 0 --search-type 0 --split 0 --split-memory-limit 0 -v 3 --threads 32
Target split mode. Searching through 9 splits
Estimated memory consumption: 79G
Write VERSION (0)
Write META (1)
Write SCOREMATRIX3MER (4)
Write SCOREMATRIX2MER (3)
Write SCOREMATRIXNAME (2)
Write SPACEDPATTERN (23)
Write GENERATOR (22)
Write DBR1INDEX (5)
Write DBR1DATA (6)
Write DBR2INDEX (7)
Write DBR2DATA (8)
Write HDR1INDEX (18)
Write HDR1DATA (19)
Write ALNINDEX (24)
Write ALNDATA (25)
Index table: counting k-mers
[=================================================================] 100.00% 23.11M 27m 25s 83ms
Index table: Masked residues: 122354587
Index table: fill
tmp/17913398511991990568/createindex.sh: line 56: 114573 Killed "$MMSEQS" $INDEXER "$INPUT" "$INPUT" ${INDEX_PAR}
Error: indexdb died
Context
Providing context helps us come up with a solution and improve our documentation for the future.
Your Environment
Include as many relevant details about the environment you experienced the bug in.
- Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters):75af0c82edf34587548bacc865cfa1d2261a9696
- Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.):self-compiled
- For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation:cmake 2.8.10
- Server specifications (especially CPU support for AVX2/SSE and amount of system memory):CPU support for AVX2/SSE and memory is 100GB
- Operating system and version:linux 3.10.0 X86_64
It seems like the job got killed by the operation system. Could it be possible that somebody else used main memory at the same time?
@martin-steinegger I use mmseqs in a slurm cluster. I salloc a node and get into it to do what I do. So it can't be used by somebody else.
Did you try increasing your compute?
Estimated memory consumption: 79G with total memory 100GB should ideally work but I think there might be some portion of RAM being devoted to other tasks and that might make the OS terminate it.