MMseqs2
MMseqs2 copied to clipboard
easy-search (speed-up)
Expected Behavior
The functional annotation of representative sequences (75 GB) in a FASTA file with eggNOG and PFAM.
Current Behavior
I started with the eggNOG annotation, which is running for more than 120 hours now. Is there a way to speed the process up?
MMseqs Output (for bugs)
repSEQS.fna
Create directory repSEQS_eggnog.tmp
easy-search repSEQS.fna databases/eggnog repSEQS_eggnog.csv repSEQS_eggnog.tmp \
--dbtype 2 \
--split-memory-limit 300G \
--threads 56 \
--remove-tmp-files false \
--greedy-best-hits 1
MMseqs Version: 8ff26f23a6b880df36cadb707890084503ceaffb
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace false
Alignment mode 3
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0
Coverage mode 0
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Threads 56
Compressed 0
Verbosity 3
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 5.7
k-mer length 0
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max results per query 300
Split database 0
Split mode 2
Split memory limit 300G
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Minimum diagonal score 15
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.001
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Gap pseudo count 10
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 0
Search iterations 1
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files false
Alignment format 0
Format alignment output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits
Database output false
Overlap threshold 0
Database type 2
Shuffle input database true
Createdb mode 0
Write lookup file 0
Greedy best hits true
Alignment backtraces will be computed, since they were requested by output format.
createdb repSEQS.fna repSEQS_eggnog.tmp/16640501639052377423/query --dbtype 2 --shuffle 1 --createdb-mode 0 --write-lookup 0 --id-offset 0 --compressed 0 -v 3
Converting sequences
[=================================================================================================== 1 Mio. sequences processed
=================================================================================================== 2 Mio. sequences processed
=================================================================================================== 3 Mio. sequences processed
=================================================================================================== 4 Mio. sequences processed
=================================================================================================== 5 Mio. sequences processed
=================================================================================================== 6 Mio. sequences processed
=================================================================================================== 7 Mio. sequences processed
=================================================================================================== 8 Mio. sequences processed
=================================================================================================== 9 Mio. sequences processed
=================================================================================================== 10 Mio. sequences processed
=================================================================================================== 11 Mio. sequences processed
=================================================================================================== 12 Mio. sequences processed
=================================================================================================== 13 Mio. sequences processed
=================================================================================================== 14 Mio. sequences processed
=================================================================================================== 15 Mio. sequences processed
=================================================================================================== 16 Mio. sequences processed
=================================================================================================== 17 Mio. sequences processed
=================================================================================================== 18 Mio. sequences processed
=================================================================================================== 19 Mio. sequences processed
=================================================================================================== 20 Mio. sequences processed
=================================================================================================== 21 Mio. sequences processed
=================================================================================================== 22 Mio. sequences processed
=================================================================================================== 23 Mio. sequences processed
=================================================================================================== 24 Mio. sequences processed
=================================================================================================== 25 Mio. sequences processed
=================================================================================================== 26 Mio. sequences processed
=================================================================================================== 27 Mio. sequences processed
=================================================================================================== 28 Mio. sequences processed
=================================================================================================== 29 Mio. sequences processed
=================================================================================================== 30 Mio. sequences processed
=================================================================================================== 31 Mio. sequences processed
=================================================================================================== 32 Mio. sequences processed
=================================================================================================== 33 Mio. sequences processed
=================================================================================================== 34 Mio. sequences processed
=================================================================================================== 35 Mio. sequences processed
=================================================================================================== 36 Mio. sequences processed
=================================================================================================== 37 Mio. sequences processed
=================================================================================================== 38 Mio. sequences processed
=================================================================================================== 39 Mio. sequences processed
=================================================================================================== 40 Mio. sequences processed
=================================================================================================== 41 Mio. sequences processed
=================================================================================================== 42 Mio. sequences processed
=================================================================================================== 43 Mio. sequences processed
=================================================================================================== 44 Mio. sequences processed
=================================================================================================== 45 Mio. sequences processed
=================================================================================================== 46 Mio. sequences processed
=================================================================================================== 47 Mio. sequences processed
=================================================================================================== 48 Mio. sequences processed
=================================================================================================== 49 Mio. sequences processed
=================================================================================================== 50 Mio. sequences processed
=================================================================================================== 51 Mio. sequences processed
=================================================================================================== 52 Mio. sequences processed
=================================================================================================== 53 Mio. sequences processed
=================================================================================================== 54 Mio. sequences processed
=================================================================================================== 55 Mio. sequences processed
=================================================================================================== 56 Mio. sequences processed
=================================================================================================== 57 Mio. sequences processed
=================================================================================================== 58 Mio. sequences processed
=================================================================================================== 59 Mio. sequences processed
=================================================================================================== 60 Mio. sequences processed
=================================================================================================== 61 Mio. sequences processed
=================================================================================================== 62 Mio. sequences processed
=================================================================================================== 63 Mio. sequences processed
=================================================================================================== 64 Mio. sequences processed
=================================================================================================== 65 Mio. sequences processed
=================================================================================================== 66 Mio. sequences processed
=================================================================================================== 67 Mio. sequences processed
=================================================================================================== 68 Mio. sequences processed
=================================================================================================== 69 Mio. sequences processed
=================================================================================================== 70 Mio. sequences processed
=================================================================================================== 71 Mio. sequences processed
=================================================================================================== 72 Mio. sequences processed
=================================================================================================== 73 Mio. sequences processed
=================================================================================================== 74 Mio. sequences processed
=================================================================================================== 75 Mio. sequences processed
=================================================================================================== 76 Mio. sequences processed
=================================================================================================== 77 Mio. sequences processed
=================================================================================================== 78 Mio. sequences processed
=================================================================================================== 79 Mio. sequences processed
=================================================================================================== 80 Mio. sequences processed
=================================================================================================== 81 Mio. sequences processed
=================================================================================================== 82 Mio. sequences processed
=================================================================================================== 83 Mio. sequences processed
=================================================================================================== 84 Mio. sequences processed
=================================================================================================== 85 Mio. sequences processed
=================================================================================================== 86 Mio. sequences processed
=================================================================================================== 87 Mio. sequences processed
=================================================================================================== 88 Mio. sequences processed
=================================================================================================== 89 Mio. sequences processed
=================================================================================================== 90 Mio. sequences processed
=================================================================================================== 91 Mio. sequences processed
=================================================================================================== 92 Mio. sequences processed
=================================================================================================== 93 Mio. sequences processed
=================================================================================================== 94 Mio. sequences processed
=================================================================================================== 95 Mio. sequences processed
=============================
Time for merging to query_h: 0h 0m 32s 329ms
Time for merging to query: 0h 3m 16s 622ms
Database type: Nucleotide
Time for processing: 0h 27m 53s 813ms
Create directory repSEQS_eggnog.tmp/16640501639052377423/search_tmp
search repSEQS_eggnog.tmp/16640501639052377423/query databases/eggnog repSEQS_eggnog.tmp/16640501639052377423/result repSEQS_eggnog.tmp/16640501639052377423/search_tmp -a 1 --alignment-mode 3 --threads 56 -s 5.7 --split-memory-limit 300G --remove-tmp-files 0
extractorfs repSEQS_eggnog.tmp/16640501639052377423/query repSEQS_eggnog.tmp/16640501639052377423/search_tmp/1950629703809443685/q_orfs_aa --min-length 30 --max-length 32734 --max-gaps 2147483647 --contig-start-mode 2 --contig-end-mode 2 --orf-start-mode 1 --forward-frames 1,2,3 --reverse-frames 1,2,3 --translation-table 1 --translate 1 --use-all-table-starts 0 --id-offset 0 --create-lookup 0 --threads 56 --compressed 0 -v 3
[=================================================================] 95.29M 10m 53s 267ms
Time for merging to q_orfs_aa_h: 0h 14m 59s 800ms
Time for merging to q_orfs_aa: 0h 33m 4s 490ms
Time for processing: 1h 14m 4s 658ms
prefilter repSEQS_eggnog.tmp/16640501639052377423/search_tmp/1950629703809443685/q_orfs_aa databases/eggnog repSEQS_eggnog.tmp/16640501639052377423/search_tmp/1950629703809443685/search/pref --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -s 5.7 -k 5 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 65535 --max-seqs 300 --split 0 --split-mode 2 --split-memory-limit 300G -c 0 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 56 --compressed 0 -v 3
Query database size: 1303062545 type: Aminoacid
Estimated memory consumption: 2G
Target database size: 349750 type: Profile
Index table k-mer threshold: 82 at k-mer size 5
Index table: counting k-mers
[=================================================================] 349.75K 1m 42s 520ms
Index table: Masked residues: 0
Index table: fill
[=================================================================] 349.75K 5m 18s 145ms
Index statistics
Entries: 14682023111
DB size: 84042 MB
Avg k-mer size: 3594.921651
Top 10 k-mers
PPPPW 38077
PPPWW 37617
PPWPP 34827
PPPGW 33942
WWWPP 33931
PPPDW 33516
PPWPW 33505
PPWRW 32205
PWPPW 31944
PPPQW 31811
Time for index table init: 0h 9m 20s 184ms
Process prefiltering step 1 of 1
k-mer similarity threshold: 82
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 1303062545
Target db start 1 to 349750
[=================================================================] 1.30B 86h 42m 2s 376ms
0.785483 k-mers per position
240012 DB matches per sequence
5731753 overflows
0 queries produce too many hits (truncated result)
269 sequences passed prefiltering per query sequence
300 median result list length
134238 sequences with 0 size result lists
Time for merging to pref: 0h 30m 15s 580ms
Time for processing: 88h 9m 11s 291ms
swapresults repSEQS_eggnog.tmp/16640501639052377423/search_tmp/1950629703809443685/q_orfs_aa databases/eggnog repSEQS_eggnog.tmp/16640501639052377423/search_tmp/1950629703809443685/search/pref repSEQS_eggnog.tmp/16640501639052377423/search_tmp/1950629703809443685/search/pref_swapped --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -e 0.001 --split-memory-limit 300G --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --threads 56 --compressed 0 --db-load-mode 0 -v 3
Computing offsets.
[=================================================================] 1.30B 2h 8m 45s 98ms
Reading results.
[=================================================================] 1.30B 5h 47m 7s 401ms
Output database: repSEQS_eggnog.tmp/16640501639052377423/search_tmp/1950629703809443685/search/pref_swapped
[=================================================================] 26.35K 11m 16s 126ms
Time for merging to pref_swapped_0: 0h 40m 12s 625ms
Reading results.
[=================================================================] 1.30B 5h 40m 43s 346ms
Output database: repSEQS_eggnog.tmp/16640501639052377423/search_tmp/1950629703809443685/search/pref_swapped
[=================================================================] 32.57K 11m 9s 696ms
Time for merging to pref_swapped_1: 0h 38m 42s 418ms
Reading results.
[=================================================================] 1.30B 5h 39m 21s 0ms
Output database: repSEQS_eggnog.tmp/16640501639052377423/search_tmp/1950629703809443685/search/pref_swapped
[=================================================================] 27.87K 11m 16s 144ms
Time for merging to pref_swapped_2: 0h 39m 55s 667ms
Reading results.
[=================================================================] 1.30B 5h 36m 38s 949ms
Output database: repSEQS_eggnog.tmp/16640501639052377423/search_tmp/1950629703809443685/search/pref_swapped
[=================================================================] 25.02K 11m 10s 765ms
Time for merging to pref_swapped_3: 0h 38m 48s 751ms
Reading results.
[=================================================================] 1.30B 5h 35m 5s 521ms
Output database: repSEQS_eggnog.tmp/16640501639052377423/search_tmp/1950629703809443685/search/pref_swapped
[=================================================================] 28.27K 11m 14s 658ms
Time for merging to pref_swapped_4: 0h 40m 16s 359ms
Reading results.
[=================================================================] 1.30B 6h 4m 24s 557ms
Output database: repSEQS_eggnog.tmp/16640501639052377423/search_tmp/1950629703809443685/search/pref_swapped
[=================================================================] 32.79K 11m 19s 893ms
Time for merging to pref_swapped_5: 0h 40m 17s 973ms
Reading results.
[=================================================================] 1.30B 6h 3m 45s 577ms
Output database: repSEQS_eggnog.tmp/16640501639052377423/search_tmp/1950629703809443685/search/pref_swapped
[=================================================================] 22.66K 11m 12s 347ms
Time for merging to pref_swapped_6: 0h 40m 8s 817ms
Reading results.
[============================
Your Environment
- Ubuntu 18.04
- CPU platform: Intel Haswell x86/64
- Boot disk size: 18 TB
- 64 vCPU and 425984 MiB