prefilter step died when running easy-search: Segmentation fault (core dumped)
Expected Behavior
easy-search should finish execution without errors
Current Behavior
Error during pre-filter step
Index table k-mer threshold: 0 at k-mer size 15 Index table: counting k-mers Segmentation fault (core dumped) ] 0.00% 1 eta - Error: Prefilter died Error: Search step died Error: Search died
Steps to Reproduce (for bugs)
First create a custom nucleotide database
mmseqs createdb --dbtype 2 --compressed 1 refseq_bacteria_archaea_fungi_viral.fna.gz seqTaxDB
mmseqs createtaxdb seqTaxDB tmp --ncbi-tax-dump ncbi-taxdump --tax-mapping-file fastaid_taxid.tsv
Next run easy-search
mmseqs easy-search all_nuc.fasta seqTaxDB tax_assignments.txt tmp --search-type 3 --min-seq-id 0.65 -e 0.01 -c 0.8 --cov-mode 2 --threads 16
MMseqs Output (for bugs)
Below is the output of easy-search
easy-search all_nuc.fasta seqTaxDB tax_assignments.txt tmp --search-type 3 --min-seq-id 0.65 -e 0.01 -c 0.8 --cov-mode 2 --threads 16
MMseqs Version: 8ef39f4151eddcdc78f9c2dadf6b4dd6864435c9 Substitution matrix aa:blosum62.out,nucl:nucleotide.out Add backtrace false Alignment mode 3 Alignment mode 0 Allow wrapped scoring false E-value threshold 0.01 Seq. id. threshold 0.65 Min alignment length 0 Seq. id. mode 0 Alternative alignments 0 Coverage threshold 0.8 Coverage mode 2 Max sequence length 65535 Compositional bias 1 Compositional bias 1 Max reject 2147483647 Max accept 2147483647 Include identical seq. id. false Preload mode 0 Pseudo count a substitution:1.100,context:1.400 Pseudo count b substitution:4.100,context:5.800 Score bias 0 Realign hits false Realign score bias -0.2 Realign max seqs 2147483647 Correlation score weight 0 Gap open cost aa:11,nucl:5 Gap extension cost aa:1,nucl:2 Zdrop 40 Threads 16 Compressed 0 Verbosity 3 Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out Sensitivity 5.7 k-mer length 0 Target search mode 0 k-score seq:2147483647,prof:2147483647 Alphabet size aa:21,nucl:5 Max results per query 300 Split database 0 Split mode 2 Split memory limit 0 Diagonal scoring true Exact k-mer matching 0 Mask residues 1 Mask residues probability 0.9 Mask lower case residues 0 Minimum diagonal score 15 Selected taxa Spaced k-mers 1 Spaced k-mer pattern Local temporary path Rescore mode 0 Remove hits by seq. id. and coverage false Sort results 0 Mask profile 1 Profile E-value threshold 0.001 Global sequence weighting false Allow deletions false Filter MSA 1 Use filter only at N seqs 0 Maximum seq. id. threshold 0.9 Minimum seq. id. 0.0 Minimum score per column -20 Minimum coverage 0 Select N most diverse seqs 1000 Pseudo count mode 0 Min codons in orf 30 Max codons in length 32734 Max orf gaps 2147483647 Contig start mode 2 Contig end mode 2 Orf start mode 1 Forward frames 1,2,3 Reverse frames 1,2,3 Translation table 1 Translate orf 0 Use all table starts false Offset of numeric ids 0 Create lookup 0 Add orf stop false Overlap between sequences 0 Sequence split mode 1 Header split mode 0 Chain overlapping alignments 0 Merge query 1 Search type 3 Search iterations 1 Start sensitivity 4 Search steps 1 Prefilter mode 0 Exhaustive search mode false Filter results during exhaustive search 0 Strand selection 1 LCA search mode false Disk space limit 0 MPI runner Force restart with latest tmp false Remove temporary files true Alignment format 0 Format alignment output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits Database output false Overlap threshold 0 Database type 0 Shuffle input database true Createdb mode 0 Write lookup file 0 Greedy best hits false
createdb all_nuc.fasta tmp/7701176895607249840/query --dbtype 0 --shuffle 1 --createdb-mode 0 --write-lookup 0 --id-offset 0 --compressed 0 -v 3
Converting sequences [1335322] 2s 17mss Time for merging to query_h: 0h 0m 0s 221ms Time for merging to query: 0h 0m 1s 64ms Database type: Nucleotide Time for processing: 0h 0m 4s 959ms Create directory tmp/7701176895607249840/search_tmp search tmp/7701176895607249840/query seqTaxDB tmp/7701176895607249840/result tmp/7701176895607249840/search_tmp --alignment-mode 3 -e 0.01 --min-seq-id 0.65 -c 0.8 --cov-mode 2 --threads 16 -s 5.7 --search-type 3 --remove-tmp-files 1
splitsequence seqTaxDB tmp/7701176895607249840/search_tmp/9045538653068861586/target_seqs_split --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --create-lookup 0 --threads 16 --compressed 0 -v 3
[=================================================================] 100.00% 22.15M 12s 856ms
Time for merging to target_seqs_split_h: 0h 0m 31s 837ms Time for merging to target_seqs_split: 0h 0m 35s 517ms Time for processing: 0h 1m 59s 373ms extractframes tmp/7701176895607249840/query tmp/7701176895607249840/search_tmp/9045538653068861586/query_seqs --forward-frames 1 --reverse-frames 1 --create-lookup 0 --threads 16 --compressed 0 -v 3[=================================================================] 100.00% 1.34M 0s 620ms
Time for merging to query_seqs_h: 0h 0m 0s 734ms Time for merging to query_seqs: 0h 0m 2s 576ms Time for processing: 0h 0m 5s 91ms splitsequence tmp/7701176895607249840/search_tmp/9045538653068861586/query_seqs tmp/7701176895607249840/search_tmp/9045538653068861586/query_seqs_split --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --create-lookup 0 --threads 16 --compressed 0 -v 3[=================================================================] 100.00% 2.67M 0s 919ms
Time for merging to query_seqs_split_h: 0h 0m 0s 832ms Time for merging to query_seqs_split: 0h 0m 0s 878ms Time for processing: 0h 0m 3s 919ms prefilter tmp/7701176895607249840/search_tmp/9045538653068861586/query_seqs_split tmp/7701176895607249840/search_tmp/9045538653068861586/target_seqs_split tmp/7701176895607249840/search_tmp/9045538653068861586/search/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 15 --target-search-mode 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 10000 --max-seqs 300 --split 0 --split-mode 2 --split-memory-limit 0 -c 0.8 --cov-mode 2 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 1 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 16 --compressed 0 -v 3 -s 5.7Query database size: 2670930 type: Nucleotide Target split mode. Searching through 18 splits Estimated memory consumption: 326G Target database size: 100684280 type: Nucleotide Process prefiltering step 1 of 18
Index table k-mer threshold: 0 at k-mer size 15 Index table: counting k-mers Segmentation fault (core dumped) ] 0.00% 1 eta - Error: Prefilter died Error: Search step died Error: Search died
Context
Hi I am trying to run an nucleotide-nucleotide search in mmseq2 with a custom database. This error does not occur with a different, smaller nucleotide database.
Thank you very much for this amazing tool and all your hard work.
Your Environment
I am using a google cloud VM with 64 CPUs and 416 GBs of memory on an ubuntu operating system, version 20.04.
I install mmseq with the command
static build with AVX2 (fastest)
wget https://mmseqs.com/latest/mmseqs-linux-avx2.tar.gz; tar xvfz mmseqs-linux-avx2.tar.gz; export PATH=$(pwd)/mmseqs/bin/:$PATH
I have the same error when running a NT search in mmseq2 NT NCBI database. I am running on our internal server with 256 GB memory.
I've encountered segfault errors with mmseqs due to not enough memory (which is a valid reason for segfaults, according to quick web search). Large databases like NT/GTDB might need around 900GB RAM, so I would guess too little RAM is the reason in your cases as well.
Can you try again with release 16. We fixed one bug that resulted in excessive memory consumption in the prefilter for some target databases.
I use release 16.747c6 (via conda), however, I also get this error.
Does release 17 possibly solve this problem @LauraVP1994 . We seem to be hitting something similar...
This should be fixed in 17. Please update to that release