ColabFold Indexdb died error message when creating colabfold_envdb_202108

Hi,

I was trying to setup the database. But it breaks upon the execution of this code:

mmseqs createindex colabfold_envdb_202108_db tmp2 --remove-tmp-files 1 The error message I get is this:

MMseqs Version:          	edb8223d1ea07385ffe63d4f103af0eb12b2058e
Seed substitution matrix 	aa:VTML80.out,nucl:nucleotide.out
k-mer length             	0
Alphabet size            	aa:21,nucl:5
Compositional bias       	1
Max sequence length      	65535
Max results per query    	300
Mask residues            	1
Mask lower case residues 	0
Spaced k-mers            	1
Spaced k-mer pattern
Sensitivity              	7.5
k-score                  	seq:0,prof:0
Check compatible         	0
Search type              	0
Split database           	0
Split memory limit       	0
Verbosity                	3
Threads                  	8
Min codons in orf        	30
Max codons in length     	32734
Max orf gaps             	2147483647
Contig start mode        	2
Contig end mode          	2
Orf start mode           	1
Forward frames           	1,2,3
Reverse frames           	1,2,3
Translation table        	1
Translate orf            	0
Use all table starts     	false
Offset of numeric ids    	0
Create lookup            	0
Compressed               	0
Add orf stop             	false
Overlap between sequences	0
Sequence split mode      	1
Header split mode        	0
Strand selection         	1
Remove temporary files   	true

indexdb colabfold_envdb_202108_db colabfold_envdb_202108_db --seed-sub-mat aa:VTML80.out,nucl:nucleotide.out -k 0 --alph-size aa:21,nucl:5 --comp-bias-corr 1 --max-seq-len 65535 --max-seqs 300 --mask 1 --mask-lower-case 0 --spaced-kmer-mode 1 -s 7.5 --k-score seq:0,prof:0 --check-compatible 0 --search-type 0 --split 0 --split-memory-limit 0 -v 3 --threads 8

Target split mode. Searching through 34 splits
Estimated memory consumption: 29G
Write VERSION (0)
Write META (1)
Write SCOREMATRIX3MER (4)
Write SCOREMATRIX2MER (3)
Write SCOREMATRIXNAME (2)
Write SPACEDPATTERN (23)
Write GENERATOR (22)
Write DBR1INDEX (5)
Write DBR1DATA (6)
Write DBR2INDEX (7)
Killed
Error: indexdb died

It works fine with uniref30_2103.tar.gz file though.

How can I resolve the problem?

G.V.

Dec 08 '21 06:12 gundalav

I assume your computer does not have enough RAM. How much RAM does your server has?

Dec 08 '21 07:12 martin-steinegger

I am using AWS p3.2xlarge instance. It has around 61GB RAM. Screen Shot 2021-12-08 at 16 18 50

Dec 08 '21 07:12 gundalav

Online searches: Our Colabfold server has ~760GB RAM and keeps full database and index in memory. Batch searches: To perform a batch search you require less memory. But its still approx 1 byte per residue. So I would assume you would probably require at least 90GB. We still need to figure out whats the lower bound for this database.

Dec 08 '21 07:12 martin-steinegger

i have 128G RAM, but i have same erro . the erro : Estimated memory consumption: 560G Process needs more than 38G main memory. Increase the size of --split or set it to 0 to automatically optimize target database split. Write VERSION (0) Write META (1) Write SCOREMATRIX3MER (4) Write SCOREMATRIX2MER (3) Write SCOREMATRIXNAME (2) Write SPACEDPATTERN (23) Write GENERATOR (22) Write DBR1INDEX (5) Write DBR1DATA (6) Write DBR2INDEX (7) Write DBR2DATA (8) Write HDR1INDEX (18) Write HDR1DATA (19) Write ALNINDEX (24) Write ALNDATA (25) Index table: counting k-mers [=================================================================] 100.00% 209.34M 7m 34s 698ms
Index table: Masked residues: 1117805658 Can not allocate entries memory in IndexTable::initMemory Error: indexdb died

Sep 04 '22 10:09 maxshen29

Indexdb died error message when creating colabfold_envdb_202108_db with MMseqs