Issues tweaking prefilter
I'm trying to align a metabarcoding dataset against a large COI reference database. easy-search with default parameters is missing some expected 100% identity hits, and finding these requires me to increase --max-seqs to an excessive 100,000. Is there any way to tweak prefiltering so that I can bring this down a bit? I tried adjusting min-ungapped-score but for my test query sequence, with --min-ungapped-score 256 not a single sequence passes while --min-ungapped-score 255 results in 100,924 sequences passing.
resultDB_pref exists and will be overwritten
prefilter --max-seqs 1000000 --min-ungapped-score 255 queryDB targetDB resultDB_pref
MMseqs Version: 17.b804f
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 4
k-mer length 0
Target search mode 0
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max sequence length 65535
Max results per query 1000000
Split database 0
Split mode 2
Split memory limit 0
Coverage threshold 0
Coverage mode 0
Compositional bias 1
Compositional bias 1
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Mask lower letter repeating N times 0
Minimum diagonal score 255
Selected taxa
Include identical seq. id. false
Spaced k-mers 1
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Spaced k-mer pattern
Local temporary path
Threads 8
Compressed 0
Verbosity 3
Query database size: 1 type: Nucleotide
Estimated memory consumption: 9G
Target database size: 2518106 type: Nucleotide
Index table k-mer threshold: 0 at k-mer size 6
Index table: counting k-mers
[=================================================================] 2.52M 23s 186ms
Index table: Masked residues: 7081796
Index table: fill
[=================================================================] 2.52M 23s 885ms
Index statistics
Entries: 1034168203
DB size: 5917 MB
Avg k-mer size: 252482.471436
Top 10 k-mers
TATTTT 2130492
TTTAAT 2063205
TTTTTT 2002420
AATTTT 1941438
TTCTAT 1836825
TTACTT 1819642
TATCTT 1785271
TTATTT 1762536
ATTGGG 1746212
TATGGG 1668386
Time for index table init: 0h 0m 56s 624ms
Process prefiltering step 1 of 1
k-mer similarity threshold: 0
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 1
Target db start 1 to 2518106
[=================================================================] 1 0s 1ms
0.971246 k-mers per position
235058738 DB matches per sequence
1 overflows
100924 sequences passed prefiltering per query sequence
100924 median result list length
0 sequences with 0 size result lists
Time for merging to resultDB_pref: 0h 0m 0s 2ms
Time for processing: 0h 1m 5s 376ms
resultDB_pref exists and will be overwritten
prefilter --max-seqs 1000000 --min-ungapped-score 256 queryDB targetDB resultDB_pref
MMseqs Version: 17.b804f
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 4
k-mer length 0
Target search mode 0
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max sequence length 65535
Max results per query 1000000
Split database 0
Split mode 2
Split memory limit 0
Coverage threshold 0
Coverage mode 0
Compositional bias 1
Compositional bias 1
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Mask lower letter repeating N times 0
Minimum diagonal score 256
Selected taxa
Include identical seq. id. false
Spaced k-mers 1
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Spaced k-mer pattern
Local temporary path
Threads 8
Compressed 0
Verbosity 3
Query database size: 1 type: Nucleotide
Estimated memory consumption: 9G
Target database size: 2518106 type: Nucleotide
Index table k-mer threshold: 0 at k-mer size 6
Index table: counting k-mers
[=================================================================] 2.52M 23s 139ms
Index table: Masked residues: 7081796
Index table: fill
[=================================================================] 2.52M 23s 952ms
Index statistics
Entries: 1034168203
DB size: 5917 MB
Avg k-mer size: 252482.471436
Top 10 k-mers
TATTTT 2130492
TTTAAT 2063205
TTTTTT 2002420
AATTTT 1941438
TTCTAT 1836825
TTACTT 1819642
TATCTT 1785271
TTATTT 1762536
ATTGGG 1746212
TATGGG 1668386
Time for index table init: 0h 0m 56s 647ms
Process prefiltering step 1 of 1
k-mer similarity threshold: 0
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 1
Target db start 1 to 2518106
[=================================================================] 1 0s 1ms
0.971246 k-mers per position
235058738 DB matches per sequence
1 overflows
0 sequences passed prefiltering per query sequence
0 median result list length
1 sequences with 0 size result lists
Time for merging to resultDB_pref: 0h 0m 0s 2ms
Time for processing: 0h 1m 5s 315ms
When prefilter is called through the search workflow for nucleotide sequences it sets a default k-mer size of 15 (and a bunch of other parameters).
Apparently, we don't set a sensible k-mer size by default when prefilter is called directly for nucleotides.
Can you please try with the following parameters:
--exact-kmer-matching 1 -k 15 --strand 2
The last (strand) you can either drop or not depending on your application. By default prefilter only reports hits on the forward strand.
Thanks. I have been testing a bit more with easy-search instead, and I still need a lot of sequences passing the prefilter to get my expected hits. I'm not sure if I'm using --min-ungapped-score correctly as I either have 0 sequences passing or >50k. I assume this is going to cause issues when I have variable amplicon lengths.
Here are some tests, the db is available here:
| --max-seqs | --min-ungapped-score | passing prefilter | hits |
|---|---|---|---|
| 10,000,000 | default (15) | 720,381 | 2 |
| 1,000,000 | default (15) | 503,454 | 2 |
| 100,000 | default (15) | 53,454 | 2 |
| 10,000 | default (15) | 8,454 | 0 |
| 1,000,000 | 256 | 0 | 0 |
| 1,000,000 | 255 | 54,149 | 2 |
| 1,000,000 | 250 | 275,406 | 2 |
mmseqs easy-search --search-type 3 --min-seq-id 0.95 -c 1 --cov-mode 2 --max-seqs 10000000 query.fasta targetDB alnRes.m8 tmp > log_1.txt
alnRes.m8 exists and will be overwritten
easy-search --search-type 3 --min-seq-id 0.95 -c 1 --cov-mode 2 --max-seqs 10000000 query.fasta targetDB alnRes.m8 tmp
MMseqs Version: 17.b804f
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace false
Alignment mode 3
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0.95
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 1
Coverage mode 2
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Threads 8
Compressed 0
Verbosity 3
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 5.7
k-mer length 0
Target search mode 0
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max results per query 10000000
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Mask lower letter repeating N times 0
Minimum diagonal score 15
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Use GPU 0
Use GPU server 0
Wait for GPU server 600
Prefilter mode 0
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.001
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Profile output mode 0
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 3
Search iterations 1
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files true
Translation mode 0
Alignment format 0
Format alignment output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits
Database output false
Overlap threshold 0
Database type 0
Shuffle input database true
Createdb mode 0
Write lookup file 0
Greedy best hits false
createdb query.fasta tmp/18070968075722086726/query --dbtype 0 --shuffle 0 --createdb-mode 0 --write-lookup 0 --id-offset 0 --compressed 0 -v 3
Converting sequences
[
Time for merging to query_h: 0h 0m 0s 0ms
Time for merging to query: 0h 0m 0s 0ms
Database type: Nucleotide
Time for processing: 0h 0m 0s 0ms
Create directory tmp/18070968075722086726/search_tmp
search tmp/18070968075722086726/query targetDB tmp/18070968075722086726/result tmp/18070968075722086726/search_tmp --alignment-mode 3 --min-seq-id 0.95 -c 1 --cov-mode 2 -s 5.7 --max-seqs 10000000 --search-type 3 --remove-tmp-files 1
splitsequence targetDB tmp/18070968075722086726/search_tmp/16018015712622693086/target_seqs_split --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --create-lookup 0 --threads 8 --compressed 0 -v 3
[=================================================================] 2.52M 0s 363ms
Time for merging to target_seqs_split_h: 0h 0m 0s 599ms
Time for merging to target_seqs_split: 0h 0m 0s 597ms
Time for processing: 0h 0m 2s 686ms
extractframes tmp/18070968075722086726/query tmp/18070968075722086726/search_tmp/16018015712622693086/query_seqs --forward-frames 1 --reverse-frames 1 --translation-table 1 --translate 0 --create-lookup 0 --threads 8 --compressed 0 -v 3
[=================================================================] 1 0s 1ms
Time for merging to query_seqs_h: 0h 0m 0s 0ms
Time for merging to query_seqs: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 5ms
splitsequence tmp/18070968075722086726/search_tmp/16018015712622693086/query_seqs tmp/18070968075722086726/search_tmp/16018015712622693086/query_seqs_split --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --create-lookup 0 --threads 8 --compressed 0 -v 3
Time for processing: 0h 0m 0s 0ms
prefilter tmp/18070968075722086726/search_tmp/16018015712622693086/query_seqs_split tmp/18070968075722086726/search_tmp/16018015712622693086/target_seqs_split tmp/18070968075722086726/search_tmp/16018015712622693086/search/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 15 --target-search-mode 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 10000 --max-seqs 10000000 --split 0 --split-mode 2 --split-memory-limit 0 -c 1 --cov-mode 2 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 1 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --mask-n-repeat 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 8 --compressed 0 -v 3 -s 5.7
Query database size: 2 type: Nucleotide
Estimated memory consumption: 17G
Target database size: 2518126 type: Nucleotide
Index table k-mer threshold: 0 at k-mer size 15
Index table: counting k-mers
[=================================================================] 2.52M 30s 913ms
Index table: Masked residues: 7081790
Index table: fill
[=================================================================] 2.52M 51s 476ms
Index statistics
Entries: 1183100244
DB size: 14961 MB
Avg k-mer size: 1.101848
Top 10 k-mers
TTTTATAATTTATGT 1129779
CGATAATATAGTTTG 904748
CATCTTTATATTTTT 719520
ATATGGAGATGAATG 627787
ATATATACTATATGG 589217
CCTTATAATGGTTGG 505210
ATTTTCTTACTGCGG 479550
TTTCCGAAAAAATAG 467381
ATTCATATATCGACG 459675
GCCCGATAAGTCCCG 433993
Time for index table init: 0h 1m 37s 914ms
Process prefiltering step 1 of 1
k-mer similarity threshold: 0
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 2
Target db start 1 to 2518126
[=================================================================] 2 0s 2ms
0.929712 k-mers per position
3341324 DB matches per sequence
1 overflows
720381 sequences passed prefiltering per query sequence
1433854 median result list length
0 sequences with 0 size result lists
Time for merging to pref_0: 0h 0m 0s 0ms
Time for processing: 0h 1m 40s 284ms
align tmp/18070968075722086726/search_tmp/16018015712622693086/query_seqs_split tmp/18070968075722086726/search_tmp/16018015712622693086/target_seqs_split tmp/18070968075722086726/search_tmp/16018015712622693086/search/pref_0 tmp/18070968075722086726/search_tmp/16018015712622693086/aln --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 0 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.001 --min-seq-id 0.95 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 1 --cov-mode 2 --max-seq-len 10000 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 8 --compressed 0 -v 3
Compute score, coverage and sequence identity
Query database size: 2 type: Nucleotide
Target database size: 2518126 type: Nucleotide
Calculation of alignments
[=================================================================] 2 0s 652ms
Time for merging to aln: 0h 0m 0s 0ms
1174037 alignments calculated
2 sequence pairs passed the thresholds (0.000002 of overall calculated)
1.000000 hits per query sequence
Time for processing: 0h 2m 0s 576ms
rmdb tmp/18070968075722086726/search_tmp/16018015712622693086/search/pref_0 -v 3
Time for processing: 0h 0m 0s 2ms
rmdb tmp/18070968075722086726/search_tmp/16018015712622693086/search/aln_0 -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/18070968075722086726/search_tmp/16018015712622693086/search/input_0 -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/18070968075722086726/search_tmp/16018015712622693086/search/aln_merge -v 3
Time for processing: 0h 0m 0s 0ms
offsetalignment tmp/18070968075722086726/query tmp/18070968075722086726/search_tmp/16018015712622693086/query_seqs_split targetDB tmp/18070968075722086726/search_tmp/16018015712622693086/target_seqs_split tmp/18070968075722086726/search_tmp/16018015712622693086/aln tmp/18070968075722086726/result --chain-alignments 0 --merge-query 1 --search-type 3 --threads 8 --compressed 0 --db-load-mode 0 -v 3
Computing ORF lookup
Computing contig offsets
Computing contig lookup
Time for contig lookup: 0h 0m 0s 0ms
Writing results to: tmp/18070968075722086726/result
[=================================================================] 1 0s 0ms
Time for merging to result: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 189ms
rmdb tmp/18070968075722086726/search_tmp/16018015712622693086/q_orfs -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/18070968075722086726/search_tmp/16018015712622693086/q_orfs_aa -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/18070968075722086726/search_tmp/16018015712622693086/t_orfs -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/18070968075722086726/search_tmp/16018015712622693086/t_orfs_aa -v 3
Time for processing: 0h 0m 0s 0ms
alnRes.m8 exists and will be overwritten
convertalis tmp/18070968075722086726/query targetDB tmp/18070968075722086726/result alnRes.m8 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --format-mode 0 --format-output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits --translation-table 1 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --db-output 0 --db-load-mode 0 --search-type 3 --threads 8 --compressed 0 -v 3
[=================================================================] 1 0s 0ms
Time for merging to alnRes.m8: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 210ms
rmdb tmp/18070968075722086726/result -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/18070968075722086726/query -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/18070968075722086726/query_h -v 3
Time for processing: 0h 0m 0s 0ms
mmseqs easy-search --search-type 3 --min-seq-id 0.95 -c 1 --cov-mode 2 --max-seqs 1000000 query.fasta targetDB alnRes.m8 tmp > log_2.txt
alnRes.m8 exists and will be overwritten
easy-search --search-type 3 --min-seq-id 0.95 -c 1 --cov-mode 2 --max-seqs 1000000 query.fasta targetDB alnRes.m8 tmp
MMseqs Version: 17.b804f
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace false
Alignment mode 3
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0.95
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 1
Coverage mode 2
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Threads 8
Compressed 0
Verbosity 3
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 5.7
k-mer length 0
Target search mode 0
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max results per query 1000000
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Mask lower letter repeating N times 0
Minimum diagonal score 15
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Use GPU 0
Use GPU server 0
Wait for GPU server 600
Prefilter mode 0
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.001
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Profile output mode 0
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 3
Search iterations 1
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files true
Translation mode 0
Alignment format 0
Format alignment output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits
Database output false
Overlap threshold 0
Database type 0
Shuffle input database true
Createdb mode 0
Write lookup file 0
Greedy best hits false
search tmp/12500607059053347296/query targetDB tmp/12500607059053347296/result tmp/12500607059053347296/search_tmp --alignment-mode 3 --min-seq-id 0.95 -c 1 --cov-mode 2 -s 5.7 --max-seqs 1000000 --search-type 3 --remove-tmp-files 1
tmp/12500607059053347296/search_tmp/14998827955956736013/target_seqs_split exists and will be overwritten
splitsequence targetDB tmp/12500607059053347296/search_tmp/14998827955956736013/target_seqs_split --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --create-lookup 0 --threads 8 --compressed 0 -v 3
[=================================================================] 2.52M 0s 399ms
Time for merging to target_seqs_split_h: 0h 0m 0s 608ms
Time for merging to target_seqs_split: 0h 0m 0s 601ms
Time for processing: 0h 0m 2s 975ms
extractframes tmp/12500607059053347296/query tmp/12500607059053347296/search_tmp/14998827955956736013/query_seqs --forward-frames 1 --reverse-frames 1 --translation-table 1 --translate 0 --create-lookup 0 --threads 8 --compressed 0 -v 3
[=================================================================] 1 0s 1ms
Time for merging to query_seqs_h: 0h 0m 0s 0ms
Time for merging to query_seqs: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 4ms
splitsequence tmp/12500607059053347296/search_tmp/14998827955956736013/query_seqs tmp/12500607059053347296/search_tmp/14998827955956736013/query_seqs_split --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --create-lookup 0 --threads 8 --compressed 0 -v 3
Time for processing: 0h 0m 0s 0ms
prefilter tmp/12500607059053347296/search_tmp/14998827955956736013/query_seqs_split tmp/12500607059053347296/search_tmp/14998827955956736013/target_seqs_split tmp/12500607059053347296/search_tmp/14998827955956736013/search/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 15 --target-search-mode 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 10000 --max-seqs 1000000 --split 0 --split-mode 2 --split-memory-limit 0 -c 1 --cov-mode 2 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 1 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --mask-n-repeat 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 8 --compressed 0 -v 3 -s 5.7
Query database size: 2 type: Nucleotide
Estimated memory consumption: 17G
Target database size: 2518126 type: Nucleotide
Index table k-mer threshold: 0 at k-mer size 15
Index table: counting k-mers
[=================================================================] 2.52M 30s 888ms
Index table: Masked residues: 7081790
Index table: fill
[=================================================================] 2.52M 51s 562ms
Index statistics
Entries: 1183100244
DB size: 14961 MB
Avg k-mer size: 1.101848
Top 10 k-mers
TTTTATAATTTATGT 1129779
CGATAATATAGTTTG 904748
CATCTTTATATTTTT 719520
ATATGGAGATGAATG 627787
ATATATACTATATGG 589217
CCTTATAATGGTTGG 505210
ATTTTCTTACTGCGG 479550
TTTCCGAAAAAATAG 467381
ATTCATATATCGACG 459675
GCCCGATAAGTCCCG 433993
Time for index table init: 0h 1m 37s 903ms
Process prefiltering step 1 of 1
k-mer similarity threshold: 0
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 2
Target db start 1 to 2518126
[=================================================================] 2 0s 1ms
0.929712 k-mers per position
3341324 DB matches per sequence
1 overflows
503454 sequences passed prefiltering per query sequence
1000000 median result list length
0 sequences with 0 size result lists
Time for merging to pref_0: 0h 0m 0s 0ms
Time for processing: 0h 1m 40s 149ms
align tmp/12500607059053347296/search_tmp/14998827955956736013/query_seqs_split tmp/12500607059053347296/search_tmp/14998827955956736013/target_seqs_split tmp/12500607059053347296/search_tmp/14998827955956736013/search/pref_0 tmp/12500607059053347296/search_tmp/14998827955956736013/aln --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 0 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.001 --min-seq-id 0.95 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 1 --cov-mode 2 --max-seq-len 10000 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 8 --compressed 0 -v 3
Compute score, coverage and sequence identity
Query database size: 2 type: Nucleotide
Target database size: 2518126 type: Nucleotide
Calculation of alignments
[=================================================================] 2 0s 650ms
Time for merging to aln: 0h 0m 0s 0ms
886165 alignments calculated
2 sequence pairs passed the thresholds (0.000002 of overall calculated)
1.000000 hits per query sequence
Time for processing: 0h 1m 34s 102ms
rmdb tmp/12500607059053347296/search_tmp/14998827955956736013/search/pref_0 -v 3
Time for processing: 0h 0m 0s 2ms
rmdb tmp/12500607059053347296/search_tmp/14998827955956736013/search/aln_0 -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/12500607059053347296/search_tmp/14998827955956736013/search/input_0 -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/12500607059053347296/search_tmp/14998827955956736013/search/aln_merge -v 3
Time for processing: 0h 0m 0s 0ms
offsetalignment tmp/12500607059053347296/query tmp/12500607059053347296/search_tmp/14998827955956736013/query_seqs_split targetDB tmp/12500607059053347296/search_tmp/14998827955956736013/target_seqs_split tmp/12500607059053347296/search_tmp/14998827955956736013/aln tmp/12500607059053347296/result --chain-alignments 0 --merge-query 1 --search-type 3 --threads 8 --compressed 0 --db-load-mode 0 -v 3
Computing ORF lookup
Computing contig offsets
Computing contig lookup
Time for contig lookup: 0h 0m 0s 0ms
Writing results to: tmp/12500607059053347296/result
[=================================================================] 1 0s 0ms
Time for merging to result: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 186ms
rmdb tmp/12500607059053347296/search_tmp/14998827955956736013/q_orfs -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/12500607059053347296/search_tmp/14998827955956736013/q_orfs_aa -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/12500607059053347296/search_tmp/14998827955956736013/t_orfs -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/12500607059053347296/search_tmp/14998827955956736013/t_orfs_aa -v 3
Time for processing: 0h 0m 0s 0ms
alnRes.m8 exists and will be overwritten
convertalis tmp/12500607059053347296/query targetDB tmp/12500607059053347296/result alnRes.m8 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --format-mode 0 --format-output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits --translation-table 1 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --db-output 0 --db-load-mode 0 --search-type 3 --threads 8 --compressed 0 -v 3
[=================================================================] 1 0s 0ms
Time for merging to alnRes.m8: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 210ms
rmdb tmp/12500607059053347296/result -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/12500607059053347296/query -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/12500607059053347296/query_h -v 3
Time for processing: 0h 0m 0s 0ms
mmseqs easy-search --search-type 3 --min-seq-id 0.95 -c 1 --cov-mode 2 --max-seqs 100000 query.fasta targetDB alnRes.m8 tmp > log_3.txt
alnRes.m8 exists and will be overwritten
easy-search --search-type 3 --min-seq-id 0.95 -c 1 --cov-mode 2 --max-seqs 100000 query.fasta targetDB alnRes.m8 tmp
MMseqs Version: 17.b804f
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace false
Alignment mode 3
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0.95
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 1
Coverage mode 2
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Threads 8
Compressed 0
Verbosity 3
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 5.7
k-mer length 0
Target search mode 0
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max results per query 100000
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Mask lower letter repeating N times 0
Minimum diagonal score 15
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Use GPU 0
Use GPU server 0
Wait for GPU server 600
Prefilter mode 0
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.001
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Profile output mode 0
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 3
Search iterations 1
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files true
Translation mode 0
Alignment format 0
Format alignment output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits
Database output false
Overlap threshold 0
Database type 0
Shuffle input database true
Createdb mode 0
Write lookup file 0
Greedy best hits false
createdb query.fasta tmp/1014849045596888582/query --dbtype 0 --shuffle 0 --createdb-mode 0 --write-lookup 0 --id-offset 0 --compressed 0 -v 3
Converting sequences
[
Time for merging to query_h: 0h 0m 0s 0ms
Time for merging to query: 0h 0m 0s 0ms
Database type: Nucleotide
Time for processing: 0h 0m 0s 0ms
Create directory tmp/1014849045596888582/search_tmp
search tmp/1014849045596888582/query targetDB tmp/1014849045596888582/result tmp/1014849045596888582/search_tmp --alignment-mode 3 --min-seq-id 0.95 -c 1 --cov-mode 2 -s 5.7 --max-seqs 100000 --search-type 3 --remove-tmp-files 1
splitsequence targetDB tmp/1014849045596888582/search_tmp/10591307209608589943/target_seqs_split --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --create-lookup 0 --threads 8 --compressed 0 -v 3
[=================================================================] 2.52M 0s 390ms
Time for merging to target_seqs_split_h: 0h 0m 0s 612ms
Time for merging to target_seqs_split: 0h 0m 0s 605ms
Time for processing: 0h 0m 2s 712ms
extractframes tmp/1014849045596888582/query tmp/1014849045596888582/search_tmp/10591307209608589943/query_seqs --forward-frames 1 --reverse-frames 1 --translation-table 1 --translate 0 --create-lookup 0 --threads 8 --compressed 0 -v 3
[=================================================================] 1 0s 0ms
Time for merging to query_seqs_h: 0h 0m 0s 0ms
Time for merging to query_seqs: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 4ms
splitsequence tmp/1014849045596888582/search_tmp/10591307209608589943/query_seqs tmp/1014849045596888582/search_tmp/10591307209608589943/query_seqs_split --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --create-lookup 0 --threads 8 --compressed 0 -v 3
Time for processing: 0h 0m 0s 0ms
prefilter tmp/1014849045596888582/search_tmp/10591307209608589943/query_seqs_split tmp/1014849045596888582/search_tmp/10591307209608589943/target_seqs_split tmp/1014849045596888582/search_tmp/10591307209608589943/search/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 15 --target-search-mode 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 10000 --max-seqs 100000 --split 0 --split-mode 2 --split-memory-limit 0 -c 1 --cov-mode 2 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 1 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --mask-n-repeat 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 8 --compressed 0 -v 3 -s 5.7
Query database size: 2 type: Nucleotide
Estimated memory consumption: 17G
Target database size: 2518126 type: Nucleotide
Index table k-mer threshold: 0 at k-mer size 15
Index table: counting k-mers
[=================================================================] 2.52M 30s 703ms
Index table: Masked residues: 7081790
Index table: fill
[=================================================================] 2.52M 51s 21ms
Index statistics
Entries: 1183100244
DB size: 14961 MB
Avg k-mer size: 1.101848
Top 10 k-mers
TTTTATAATTTATGT 1129779
CGATAATATAGTTTG 904748
CATCTTTATATTTTT 719520
ATATGGAGATGAATG 627787
ATATATACTATATGG 589217
CCTTATAATGGTTGG 505210
ATTTTCTTACTGCGG 479550
TTTCCGAAAAAATAG 467381
ATTCATATATCGACG 459675
GCCCGATAAGTCCCG 433993
Time for index table init: 0h 1m 37s 254ms
Process prefiltering step 1 of 1
k-mer similarity threshold: 0
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 2
Target db start 1 to 2518126
[=================================================================] 2 0s 1ms
0.929712 k-mers per position
3341324 DB matches per sequence
1 overflows
53454 sequences passed prefiltering per query sequence
100000 median result list length
0 sequences with 0 size result lists
Time for merging to pref_0: 0h 0m 0s 0ms
Time for processing: 0h 1m 39s 187ms
align tmp/1014849045596888582/search_tmp/10591307209608589943/query_seqs_split tmp/1014849045596888582/search_tmp/10591307209608589943/target_seqs_split tmp/1014849045596888582/search_tmp/10591307209608589943/search/pref_0 tmp/1014849045596888582/search_tmp/10591307209608589943/aln --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 0 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.001 --min-seq-id 0.95 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 1 --cov-mode 2 --max-seq-len 10000 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 8 --compressed 0 -v 3
Compute score, coverage and sequence identity
Query database size: 2 type: Nucleotide
Target database size: 2518126 type: Nucleotide
Calculation of alignments
[=================================================================] 2 10s 383ms
Time for merging to aln: 0h 0m 0s 0ms
100989 alignments calculated
2 sequence pairs passed the thresholds (0.000020 of overall calculated)
1.000000 hits per query sequence
Time for processing: 0h 0m 11s 255ms
rmdb tmp/1014849045596888582/search_tmp/10591307209608589943/search/pref_0 -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/1014849045596888582/search_tmp/10591307209608589943/search/aln_0 -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/1014849045596888582/search_tmp/10591307209608589943/search/input_0 -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/1014849045596888582/search_tmp/10591307209608589943/search/aln_merge -v 3
Time for processing: 0h 0m 0s 0ms
offsetalignment tmp/1014849045596888582/query tmp/1014849045596888582/search_tmp/10591307209608589943/query_seqs_split targetDB tmp/1014849045596888582/search_tmp/10591307209608589943/target_seqs_split tmp/1014849045596888582/search_tmp/10591307209608589943/aln tmp/1014849045596888582/result --chain-alignments 0 --merge-query 1 --search-type 3 --threads 8 --compressed 0 --db-load-mode 0 -v 3
Computing ORF lookup
Computing contig offsets
Computing contig lookup
Time for contig lookup: 0h 0m 0s 0ms
Writing results to: tmp/1014849045596888582/result
[=================================================================] 1 0s 0ms
Time for merging to result: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 187ms
rmdb tmp/1014849045596888582/search_tmp/10591307209608589943/q_orfs -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/1014849045596888582/search_tmp/10591307209608589943/q_orfs_aa -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/1014849045596888582/search_tmp/10591307209608589943/t_orfs -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/1014849045596888582/search_tmp/10591307209608589943/t_orfs_aa -v 3
Time for processing: 0h 0m 0s 0ms
alnRes.m8 exists and will be overwritten
convertalis tmp/1014849045596888582/query targetDB tmp/1014849045596888582/result alnRes.m8 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --format-mode 0 --format-output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits --translation-table 1 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --db-output 0 --db-load-mode 0 --search-type 3 --threads 8 --compressed 0 -v 3
[=================================================================] 1 0s 0ms
Time for merging to alnRes.m8: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 210ms
rmdb tmp/1014849045596888582/result -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/1014849045596888582/query -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/1014849045596888582/query_h -v 3
Time for processing: 0h 0m 0s 0ms
mmseqs easy-search --search-type 3 --min-seq-id 0.95 -c 1 --cov-mode 2 --max-seqs 10000 query.fasta targetDB alnRes.m8 tmp > log_4.txt
alnRes.m8 exists and will be overwritten
easy-search --search-type 3 --min-seq-id 0.95 -c 1 --cov-mode 2 --max-seqs 10000 query.fasta targetDB alnRes.m8 tmp
MMseqs Version: 17.b804f
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace false
Alignment mode 3
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0.95
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 1
Coverage mode 2
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Threads 8
Compressed 0
Verbosity 3
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 5.7
k-mer length 0
Target search mode 0
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max results per query 10000
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Mask lower letter repeating N times 0
Minimum diagonal score 15
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Use GPU 0
Use GPU server 0
Wait for GPU server 600
Prefilter mode 0
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.001
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Profile output mode 0
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 3
Search iterations 1
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files true
Translation mode 0
Alignment format 0
Format alignment output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits
Database output false
Overlap threshold 0
Database type 0
Shuffle input database true
Createdb mode 0
Write lookup file 0
Greedy best hits false
createdb query.fasta tmp/5999847066562356512/query --dbtype 0 --shuffle 0 --createdb-mode 0 --write-lookup 0 --id-offset 0 --compressed 0 -v 3
Converting sequences
[
Time for merging to query_h: 0h 0m 0s 0ms
Time for merging to query: 0h 0m 0s 0ms
Database type: Nucleotide
Time for processing: 0h 0m 0s 0ms
Create directory tmp/5999847066562356512/search_tmp
search tmp/5999847066562356512/query targetDB tmp/5999847066562356512/result tmp/5999847066562356512/search_tmp --alignment-mode 3 --min-seq-id 0.95 -c 1 --cov-mode 2 -s 5.7 --max-seqs 10000 --search-type 3 --remove-tmp-files 1
splitsequence targetDB tmp/5999847066562356512/search_tmp/2012207027480231373/target_seqs_split --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --create-lookup 0 --threads 8 --compressed 0 -v 3
[=================================================================] 2.52M 0s 388ms
Time for merging to target_seqs_split_h: 0h 0m 0s 598ms
Time for merging to target_seqs_split: 0h 0m 0s 600ms
Time for processing: 0h 0m 2s 748ms
extractframes tmp/5999847066562356512/query tmp/5999847066562356512/search_tmp/2012207027480231373/query_seqs --forward-frames 1 --reverse-frames 1 --translation-table 1 --translate 0 --create-lookup 0 --threads 8 --compressed 0 -v 3
[=================================================================] 1 0s 0ms
Time for merging to query_seqs_h: 0h 0m 0s 0ms
Time for merging to query_seqs: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 4ms
splitsequence tmp/5999847066562356512/search_tmp/2012207027480231373/query_seqs tmp/5999847066562356512/search_tmp/2012207027480231373/query_seqs_split --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --create-lookup 0 --threads 8 --compressed 0 -v 3
Time for processing: 0h 0m 0s 0ms
prefilter tmp/5999847066562356512/search_tmp/2012207027480231373/query_seqs_split tmp/5999847066562356512/search_tmp/2012207027480231373/target_seqs_split tmp/5999847066562356512/search_tmp/2012207027480231373/search/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 15 --target-search-mode 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 10000 --max-seqs 10000 --split 0 --split-mode 2 --split-memory-limit 0 -c 1 --cov-mode 2 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 1 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --mask-n-repeat 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 8 --compressed 0 -v 3 -s 5.7
Query database size: 2 type: Nucleotide
Estimated memory consumption: 17G
Target database size: 2518126 type: Nucleotide
Index table k-mer threshold: 0 at k-mer size 15
Index table: counting k-mers
[=================================================================] 2.52M 30s 754ms
Index table: Masked residues: 7081790
Index table: fill
[=================================================================] 2.52M 51s 34ms
Index statistics
Entries: 1183100244
DB size: 14961 MB
Avg k-mer size: 1.101848
Top 10 k-mers
TTTTATAATTTATGT 1129779
CGATAATATAGTTTG 904748
CATCTTTATATTTTT 719520
ATATGGAGATGAATG 627787
ATATATACTATATGG 589217
CCTTATAATGGTTGG 505210
ATTTTCTTACTGCGG 479550
TTTCCGAAAAAATAG 467381
ATTCATATATCGACG 459675
GCCCGATAAGTCCCG 433993
Time for index table init: 0h 1m 37s 161ms
Process prefiltering step 1 of 1
k-mer similarity threshold: 0
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 2
Target db start 1 to 2518126
[=================================================================] 2 0s 1ms
0.929712 k-mers per position
3341324 DB matches per sequence
1 overflows
8454 sequences passed prefiltering per query sequence
10000 median result list length
0 sequences with 0 size result lists
Time for merging to pref_0: 0h 0m 0s 0ms
Time for processing: 0h 1m 39s 18ms
align tmp/5999847066562356512/search_tmp/2012207027480231373/query_seqs_split tmp/5999847066562356512/search_tmp/2012207027480231373/target_seqs_split tmp/5999847066562356512/search_tmp/2012207027480231373/search/pref_0 tmp/5999847066562356512/search_tmp/2012207027480231373/aln --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 0 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.001 --min-seq-id 0.95 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 1 --cov-mode 2 --max-seq-len 10000 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 8 --compressed 0 -v 3
Compute score, coverage and sequence identity
Query database size: 2 type: Nucleotide
Target database size: 2518126 type: Nucleotide
Calculation of alignments
[=================================================================] 2 1s 45ms
Time for merging to aln: 0h 0m 0s 0ms
16181 alignments calculated
0 sequence pairs passed the thresholds (0.000000 of overall calculated)
0.000000 hits per query sequence
Time for processing: 0h 0m 1s 916ms
rmdb tmp/5999847066562356512/search_tmp/2012207027480231373/search/pref_0 -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/5999847066562356512/search_tmp/2012207027480231373/search/aln_0 -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/5999847066562356512/search_tmp/2012207027480231373/search/input_0 -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/5999847066562356512/search_tmp/2012207027480231373/search/aln_merge -v 3
Time for processing: 0h 0m 0s 0ms
offsetalignment tmp/5999847066562356512/query tmp/5999847066562356512/search_tmp/2012207027480231373/query_seqs_split targetDB tmp/5999847066562356512/search_tmp/2012207027480231373/target_seqs_split tmp/5999847066562356512/search_tmp/2012207027480231373/aln tmp/5999847066562356512/result --chain-alignments 0 --merge-query 1 --search-type 3 --threads 8 --compressed 0 --db-load-mode 0 -v 3
Computing ORF lookup
Computing contig offsets
Computing contig lookup
Time for contig lookup: 0h 0m 0s 0ms
Writing results to: tmp/5999847066562356512/result
[=================================================================] 1 0s 0ms
Time for merging to result: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 185ms
rmdb tmp/5999847066562356512/search_tmp/2012207027480231373/q_orfs -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/5999847066562356512/search_tmp/2012207027480231373/q_orfs_aa -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/5999847066562356512/search_tmp/2012207027480231373/t_orfs -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/5999847066562356512/search_tmp/2012207027480231373/t_orfs_aa -v 3
Time for processing: 0h 0m 0s 0ms
alnRes.m8 exists and will be overwritten
convertalis tmp/5999847066562356512/query targetDB tmp/5999847066562356512/result alnRes.m8 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --format-mode 0 --format-output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits --translation-table 1 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --db-output 0 --db-load-mode 0 --search-type 3 --threads 8 --compressed 0 -v 3
[=================================================================] 1 0s 0ms
Time for merging to alnRes.m8: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 209ms
rmdb tmp/5999847066562356512/result -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/5999847066562356512/query -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/5999847066562356512/query_h -v 3
Time for processing: 0h 0m 0s 0ms
mmseqs easy-search --search-type 3 --min-seq-id 0.95 -c 1 --cov-mode 2 --max-seqs 1000000 --min-ungapped-score 256 query.fasta targetDB alnRes.m8 tmp > log_5.txt
alnRes.m8 exists and will be overwritten
easy-search --search-type 3 --min-seq-id 0.95 -c 1 --cov-mode 2 --max-seqs 1000000 --min-ungapped-score 256 query.fasta targetDB alnRes.m8 tmp
MMseqs Version: 17.b804f
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace false
Alignment mode 3
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0.95
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 1
Coverage mode 2
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Threads 8
Compressed 0
Verbosity 3
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 5.7
k-mer length 0
Target search mode 0
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max results per query 1000000
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Mask lower letter repeating N times 0
Minimum diagonal score 256
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Use GPU 0
Use GPU server 0
Wait for GPU server 600
Prefilter mode 0
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.001
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Profile output mode 0
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 3
Search iterations 1
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files true
Translation mode 0
Alignment format 0
Format alignment output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits
Database output false
Overlap threshold 0
Database type 0
Shuffle input database true
Createdb mode 0
Write lookup file 0
Greedy best hits false
createdb query.fasta tmp/685034194575351025/query --dbtype 0 --shuffle 0 --createdb-mode 0 --write-lookup 0 --id-offset 0 --compressed 0 -v 3
Converting sequences
[
Time for merging to query_h: 0h 0m 0s 0ms
Time for merging to query: 0h 0m 0s 0ms
Database type: Nucleotide
Time for processing: 0h 0m 0s 0ms
Create directory tmp/685034194575351025/search_tmp
search tmp/685034194575351025/query targetDB tmp/685034194575351025/result tmp/685034194575351025/search_tmp --alignment-mode 3 --min-seq-id 0.95 -c 1 --cov-mode 2 -s 5.7 --max-seqs 1000000 --min-ungapped-score 256 --search-type 3 --remove-tmp-files 1
splitsequence targetDB tmp/685034194575351025/search_tmp/13572469368390980697/target_seqs_split --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --create-lookup 0 --threads 8 --compressed 0 -v 3
[=================================================================] 2.52M 0s 374ms
Time for merging to target_seqs_split_h: 0h 0m 0s 598ms
Time for merging to target_seqs_split: 0h 0m 0s 600ms
Time for processing: 0h 0m 2s 722ms
extractframes tmp/685034194575351025/query tmp/685034194575351025/search_tmp/13572469368390980697/query_seqs --forward-frames 1 --reverse-frames 1 --translation-table 1 --translate 0 --create-lookup 0 --threads 8 --compressed 0 -v 3
[=================================================================] 1 0s 1ms
Time for merging to query_seqs_h: 0h 0m 0s 0ms
Time for merging to query_seqs: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 5ms
splitsequence tmp/685034194575351025/search_tmp/13572469368390980697/query_seqs tmp/685034194575351025/search_tmp/13572469368390980697/query_seqs_split --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --create-lookup 0 --threads 8 --compressed 0 -v 3
Time for processing: 0h 0m 0s 0ms
prefilter tmp/685034194575351025/search_tmp/13572469368390980697/query_seqs_split tmp/685034194575351025/search_tmp/13572469368390980697/target_seqs_split tmp/685034194575351025/search_tmp/13572469368390980697/search/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 15 --target-search-mode 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 10000 --max-seqs 1000000 --split 0 --split-mode 2 --split-memory-limit 0 -c 1 --cov-mode 2 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 1 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --mask-n-repeat 0 --min-ungapped-score 256 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 8 --compressed 0 -v 3 -s 5.7
Query database size: 2 type: Nucleotide
Estimated memory consumption: 17G
Target database size: 2518126 type: Nucleotide
Index table k-mer threshold: 0 at k-mer size 15
Index table: counting k-mers
[=================================================================] 2.52M 30s 759ms
Index table: Masked residues: 7081790
Index table: fill
[=================================================================] 2.52M 51s 43ms
Index statistics
Entries: 1183100244
DB size: 14961 MB
Avg k-mer size: 1.101848
Top 10 k-mers
TTTTATAATTTATGT 1129779
CGATAATATAGTTTG 904748
CATCTTTATATTTTT 719520
ATATGGAGATGAATG 627787
ATATATACTATATGG 589217
CCTTATAATGGTTGG 505210
ATTTTCTTACTGCGG 479550
TTTCCGAAAAAATAG 467381
ATTCATATATCGACG 459675
GCCCGATAAGTCCCG 433993
Time for index table init: 0h 1m 37s 220ms
Process prefiltering step 1 of 1
k-mer similarity threshold: 0
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 2
Target db start 1 to 2518126
[=================================================================] 2 0s 2ms
0.929712 k-mers per position
3341324 DB matches per sequence
1 overflows
0 sequences passed prefiltering per query sequence
0 median result list length
2 sequences with 0 size result lists
Time for merging to pref_0: 0h 0m 0s 0ms
Time for processing: 0h 1m 39s 65ms
align tmp/685034194575351025/search_tmp/13572469368390980697/query_seqs_split tmp/685034194575351025/search_tmp/13572469368390980697/target_seqs_split tmp/685034194575351025/search_tmp/13572469368390980697/search/pref_0 tmp/685034194575351025/search_tmp/13572469368390980697/aln --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 0 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.001 --min-seq-id 0.95 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 1 --cov-mode 2 --max-seq-len 10000 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 8 --compressed 0 -v 3
Compute score, coverage and sequence identity
Query database size: 2 type: Nucleotide
Target database size: 2518126 type: Nucleotide
Calculation of alignments
[=================================================================] 2 0s 0ms
Time for merging to aln: 0h 0m 0s 0ms
0 alignments calculated
0 sequence pairs passed the thresholds
0.000000 hits per query sequence
Time for processing: 0h 0m 0s 218ms
rmdb tmp/685034194575351025/search_tmp/13572469368390980697/search/pref_0 -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/685034194575351025/search_tmp/13572469368390980697/search/aln_0 -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/685034194575351025/search_tmp/13572469368390980697/search/input_0 -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/685034194575351025/search_tmp/13572469368390980697/search/aln_merge -v 3
Time for processing: 0h 0m 0s 0ms
offsetalignment tmp/685034194575351025/query tmp/685034194575351025/search_tmp/13572469368390980697/query_seqs_split targetDB tmp/685034194575351025/search_tmp/13572469368390980697/target_seqs_split tmp/685034194575351025/search_tmp/13572469368390980697/aln tmp/685034194575351025/result --chain-alignments 0 --merge-query 1 --search-type 3 --threads 8 --compressed 0 --db-load-mode 0 -v 3
Computing ORF lookup
Computing contig offsets
Computing contig lookup
Time for contig lookup: 0h 0m 0s 0ms
Writing results to: tmp/685034194575351025/result
[=================================================================] 1 0s 0ms
Time for merging to result: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 184ms
rmdb tmp/685034194575351025/search_tmp/13572469368390980697/q_orfs -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/685034194575351025/search_tmp/13572469368390980697/q_orfs_aa -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/685034194575351025/search_tmp/13572469368390980697/t_orfs -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/685034194575351025/search_tmp/13572469368390980697/t_orfs_aa -v 3
Time for processing: 0h 0m 0s 0ms
alnRes.m8 exists and will be overwritten
convertalis tmp/685034194575351025/query targetDB tmp/685034194575351025/result alnRes.m8 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --format-mode 0 --format-output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits --translation-table 1 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --db-output 0 --db-load-mode 0 --search-type 3 --threads 8 --compressed 0 -v 3
[=================================================================] 1 0s 0ms
Time for merging to alnRes.m8: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 207ms
rmdb tmp/685034194575351025/result -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/685034194575351025/query -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/685034194575351025/query_h -v 3
Time for processing: 0h 0m 0s 0ms
mmseqs easy-search --search-type 3 --min-seq-id 0.95 -c 1 --cov-mode 2 --max-seqs 1000000 --min-ungapped-score 255 query.fasta targetDB alnRes.m8 tmp > log_6.txt
alnRes.m8 exists and will be overwritten
easy-search --search-type 3 --min-seq-id 0.95 -c 1 --cov-mode 2 --max-seqs 1000000 --min-ungapped-score 255 query.fasta targetDB alnRes.m8 tmp
MMseqs Version: 17.b804f
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace false
Alignment mode 3
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0.95
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 1
Coverage mode 2
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Threads 8
Compressed 0
Verbosity 3
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 5.7
k-mer length 0
Target search mode 0
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max results per query 1000000
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Mask lower letter repeating N times 0
Minimum diagonal score 255
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Use GPU 0
Use GPU server 0
Wait for GPU server 600
Prefilter mode 0
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.001
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Profile output mode 0
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 3
Search iterations 1
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files true
Translation mode 0
Alignment format 0
Format alignment output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits
Database output false
Overlap threshold 0
Database type 0
Shuffle input database true
Createdb mode 0
Write lookup file 0
Greedy best hits false
createdb query.fasta tmp/1364566272043144560/query --dbtype 0 --shuffle 0 --createdb-mode 0 --write-lookup 0 --id-offset 0 --compressed 0 -v 3
Converting sequences
[
Time for merging to query_h: 0h 0m 0s 0ms
Time for merging to query: 0h 0m 0s 0ms
Database type: Nucleotide
Time for processing: 0h 0m 0s 0ms
Create directory tmp/1364566272043144560/search_tmp
search tmp/1364566272043144560/query targetDB tmp/1364566272043144560/result tmp/1364566272043144560/search_tmp --alignment-mode 3 --min-seq-id 0.95 -c 1 --cov-mode 2 -s 5.7 --max-seqs 1000000 --min-ungapped-score 255 --search-type 3 --remove-tmp-files 1
splitsequence targetDB tmp/1364566272043144560/search_tmp/5295598480935868791/target_seqs_split --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --create-lookup 0 --threads 8 --compressed 0 -v 3
[=================================================================] 2.52M 0s 339ms
Time for merging to target_seqs_split_h: 0h 0m 0s 601ms
Time for merging to target_seqs_split: 0h 0m 0s 597ms
Time for processing: 0h 0m 2s 700ms
extractframes tmp/1364566272043144560/query tmp/1364566272043144560/search_tmp/5295598480935868791/query_seqs --forward-frames 1 --reverse-frames 1 --translation-table 1 --translate 0 --create-lookup 0 --threads 8 --compressed 0 -v 3
[=================================================================] 1 0s 1ms
Time for merging to query_seqs_h: 0h 0m 0s 0ms
Time for merging to query_seqs: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 4ms
splitsequence tmp/1364566272043144560/search_tmp/5295598480935868791/query_seqs tmp/1364566272043144560/search_tmp/5295598480935868791/query_seqs_split --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --create-lookup 0 --threads 8 --compressed 0 -v 3
Time for processing: 0h 0m 0s 0ms
prefilter tmp/1364566272043144560/search_tmp/5295598480935868791/query_seqs_split tmp/1364566272043144560/search_tmp/5295598480935868791/target_seqs_split tmp/1364566272043144560/search_tmp/5295598480935868791/search/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 15 --target-search-mode 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 10000 --max-seqs 1000000 --split 0 --split-mode 2 --split-memory-limit 0 -c 1 --cov-mode 2 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 1 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --mask-n-repeat 0 --min-ungapped-score 255 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 8 --compressed 0 -v 3 -s 5.7
Query database size: 2 type: Nucleotide
Estimated memory consumption: 17G
Target database size: 2518126 type: Nucleotide
Index table k-mer threshold: 0 at k-mer size 15
Index table: counting k-mers
[=================================================================] 2.52M 30s 765ms
Index table: Masked residues: 7081790
Index table: fill
[=================================================================] 2.52M 51s 228ms
Index statistics
Entries: 1183100244
DB size: 14961 MB
Avg k-mer size: 1.101848
Top 10 k-mers
TTTTATAATTTATGT 1129779
CGATAATATAGTTTG 904748
CATCTTTATATTTTT 719520
ATATGGAGATGAATG 627787
ATATATACTATATGG 589217
CCTTATAATGGTTGG 505210
ATTTTCTTACTGCGG 479550
TTTCCGAAAAAATAG 467381
ATTCATATATCGACG 459675
GCCCGATAAGTCCCG 433993
Time for index table init: 0h 1m 37s 401ms
Process prefiltering step 1 of 1
k-mer similarity threshold: 0
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 2
Target db start 1 to 2518126
[=================================================================] 2 0s 1ms
0.929712 k-mers per position
3341324 DB matches per sequence
1 overflows
54149 sequences passed prefiltering per query sequence
107973 median result list length
0 sequences with 0 size result lists
Time for merging to pref_0: 0h 0m 0s 0ms
Time for processing: 0h 1m 39s 360ms
align tmp/1364566272043144560/search_tmp/5295598480935868791/query_seqs_split tmp/1364566272043144560/search_tmp/5295598480935868791/target_seqs_split tmp/1364566272043144560/search_tmp/5295598480935868791/search/pref_0 tmp/1364566272043144560/search_tmp/5295598480935868791/aln --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 0 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.001 --min-seq-id 0.95 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 1 --cov-mode 2 --max-seq-len 10000 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 8 --compressed 0 -v 3
Compute score, coverage and sequence identity
Query database size: 2 type: Nucleotide
Target database size: 2518126 type: Nucleotide
Calculation of alignments
[=================================================================] 2 11s 186ms
Time for merging to aln: 0h 0m 0s 0ms
102344 alignments calculated
2 sequence pairs passed the thresholds (0.000020 of overall calculated)
1.000000 hits per query sequence
Time for processing: 0h 0m 11s 443ms
rmdb tmp/1364566272043144560/search_tmp/5295598480935868791/search/pref_0 -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/1364566272043144560/search_tmp/5295598480935868791/search/aln_0 -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/1364566272043144560/search_tmp/5295598480935868791/search/input_0 -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/1364566272043144560/search_tmp/5295598480935868791/search/aln_merge -v 3
Time for processing: 0h 0m 0s 0ms
offsetalignment tmp/1364566272043144560/query tmp/1364566272043144560/search_tmp/5295598480935868791/query_seqs_split targetDB tmp/1364566272043144560/search_tmp/5295598480935868791/target_seqs_split tmp/1364566272043144560/search_tmp/5295598480935868791/aln tmp/1364566272043144560/result --chain-alignments 0 --merge-query 1 --search-type 3 --threads 8 --compressed 0 --db-load-mode 0 -v 3
Computing ORF lookup
Computing contig offsets
Computing contig lookup
Time for contig lookup: 0h 0m 0s 0ms
Writing results to: tmp/1364566272043144560/result
[=================================================================] 1 0s 0ms
Time for merging to result: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 187ms
rmdb tmp/1364566272043144560/search_tmp/5295598480935868791/q_orfs -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/1364566272043144560/search_tmp/5295598480935868791/q_orfs_aa -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/1364566272043144560/search_tmp/5295598480935868791/t_orfs -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/1364566272043144560/search_tmp/5295598480935868791/t_orfs_aa -v 3
Time for processing: 0h 0m 0s 0ms
alnRes.m8 exists and will be overwritten
convertalis tmp/1364566272043144560/query targetDB tmp/1364566272043144560/result alnRes.m8 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --format-mode 0 --format-output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits --translation-table 1 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --db-output 0 --db-load-mode 0 --search-type 3 --threads 8 --compressed 0 -v 3
[=================================================================] 1 0s 1ms
Time for merging to alnRes.m8: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 208ms
rmdb tmp/1364566272043144560/result -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/1364566272043144560/query -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/1364566272043144560/query_h -v 3
Time for processing: 0h 0m 0s 0ms
mmseqs easy-search --search-type 3 --min-seq-id 0.95 -c 1 --cov-mode 2 --max-seqs 1000000 --min-ungapped-score 200 query.fasta targetDB alnRes.m8 tmp > log_7.txt
alnRes.m8 exists and will be overwritten
easy-search --search-type 3 --min-seq-id 0.95 -c 1 --cov-mode 2 --max-seqs 1000000 --min-ungapped-score 200 query.fasta targetDB alnRes.m8 tmp
MMseqs Version: 17.b804f
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace false
Alignment mode 3
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0.95
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 1
Coverage mode 2
Max sequence length 65535
Compositional bias 1
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Threads 8
Compressed 0
Verbosity 3
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
Sensitivity 5.7
k-mer length 0
Target search mode 0
k-score seq:2147483647,prof:2147483647
Alphabet size aa:21,nucl:5
Max results per query 1000000
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Mask lower letter repeating N times 0
Minimum diagonal score 200
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Use GPU 0
Use GPU server 0
Wait for GPU server 600
Prefilter mode 0
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.001
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Profile output mode 0
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 3
Search iterations 1
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files true
Translation mode 0
Alignment format 0
Format alignment output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits
Database output false
Overlap threshold 0
Database type 0
Shuffle input database true
Createdb mode 0
Write lookup file 0
Greedy best hits false
createdb query.fasta tmp/17855978298342352080/query --dbtype 0 --shuffle 0 --createdb-mode 0 --write-lookup 0 --id-offset 0 --compressed 0 -v 3
Converting sequences
[
Time for merging to query_h: 0h 0m 0s 0ms
Time for merging to query: 0h 0m 0s 0ms
Database type: Nucleotide
Time for processing: 0h 0m 0s 0ms
Create directory tmp/17855978298342352080/search_tmp
search tmp/17855978298342352080/query targetDB tmp/17855978298342352080/result tmp/17855978298342352080/search_tmp --alignment-mode 3 --min-seq-id 0.95 -c 1 --cov-mode 2 -s 5.7 --max-seqs 1000000 --min-ungapped-score 200 --search-type 3 --remove-tmp-files 1
splitsequence targetDB tmp/17855978298342352080/search_tmp/17305500037039267211/target_seqs_split --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --create-lookup 0 --threads 8 --compressed 0 -v 3
[=================================================================] 2.52M 0s 404ms
Time for merging to target_seqs_split_h: 0h 0m 0s 597ms
Time for merging to target_seqs_split: 0h 0m 0s 595ms
Time for processing: 0h 0m 2s 706ms
extractframes tmp/17855978298342352080/query tmp/17855978298342352080/search_tmp/17305500037039267211/query_seqs --forward-frames 1 --reverse-frames 1 --translation-table 1 --translate 0 --create-lookup 0 --threads 8 --compressed 0 -v 3
[=================================================================] 1 0s 1ms
Time for merging to query_seqs_h: 0h 0m 0s 0ms
Time for merging to query_seqs: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 4ms
splitsequence tmp/17855978298342352080/search_tmp/17305500037039267211/query_seqs tmp/17855978298342352080/search_tmp/17305500037039267211/query_seqs_split --max-seq-len 10000 --sequence-overlap 0 --sequence-split-mode 1 --headers-split-mode 0 --create-lookup 0 --threads 8 --compressed 0 -v 3
Time for processing: 0h 0m 0s 0ms
prefilter tmp/17855978298342352080/search_tmp/17305500037039267211/query_seqs_split tmp/17855978298342352080/search_tmp/17305500037039267211/target_seqs_split tmp/17855978298342352080/search_tmp/17305500037039267211/search/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 15 --target-search-mode 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 10000 --max-seqs 1000000 --split 0 --split-mode 2 --split-memory-limit 0 -c 1 --cov-mode 2 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 1 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --mask-n-repeat 0 --min-ungapped-score 200 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 8 --compressed 0 -v 3 -s 5.7
Query database size: 2 type: Nucleotide
Estimated memory consumption: 17G
Target database size: 2518126 type: Nucleotide
Index table k-mer threshold: 0 at k-mer size 15
Index table: counting k-mers
[=================================================================] 2.52M 30s 755ms
Index table: Masked residues: 7081790
Index table: fill
[=================================================================] 2.52M 51s 187ms
Index statistics
Entries: 1183100244
DB size: 14961 MB
Avg k-mer size: 1.101848
Top 10 k-mers
TTTTATAATTTATGT 1129779
CGATAATATAGTTTG 904748
CATCTTTATATTTTT 719520
ATATGGAGATGAATG 627787
ATATATACTATATGG 589217
CCTTATAATGGTTGG 505210
ATTTTCTTACTGCGG 479550
TTTCCGAAAAAATAG 467381
ATTCATATATCGACG 459675
GCCCGATAAGTCCCG 433993
Time for index table init: 0h 1m 37s 320ms
Process prefiltering step 1 of 1
k-mer similarity threshold: 0
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 2
Target db start 1 to 2518126
===============================================[==================] 2 0s 2ms
0.929712 k-mers per position
3341324 DB matches per sequence
1 overflows
275406 sequences passed prefiltering per query sequence
548481 median result list length
0 sequences with 0 size result lists
Time for merging to pref_0: 0h 0m 0s 0ms
Time for processing: 0h 1m 39s 410ms
align tmp/17855978298342352080/search_tmp/17305500037039267211/query_seqs_split tmp/17855978298342352080/search_tmp/17305500037039267211/target_seqs_split tmp/17855978298342352080/search_tmp/17305500037039267211/search/pref_0 tmp/17855978298342352080/search_tmp/17305500037039267211/aln --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 0 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.001 --min-seq-id 0.95 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 1 --cov-mode 2 --max-seq-len 10000 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 8 --compressed 0 -v 3
Compute score, coverage and sequence identity
Query database size: 2 type: Nucleotide
Target database size: 2518126 type: Nucleotide
Calculation of alignments
[=================================================================] 2 0s 258ms
Time for merging to aln: 0h 0m 0s 0ms
509189 alignments calculated
2 sequence pairs passed the thresholds (0.000004 of overall calculated)
1.000000 hits per query sequence
Time for processing: 0h 0m 55s 455ms
rmdb tmp/17855978298342352080/search_tmp/17305500037039267211/search/pref_0 -v 3
Time for processing: 0h 0m 0s 1ms
rmdb tmp/17855978298342352080/search_tmp/17305500037039267211/search/aln_0 -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/17855978298342352080/search_tmp/17305500037039267211/search/input_0 -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/17855978298342352080/search_tmp/17305500037039267211/search/aln_merge -v 3
Time for processing: 0h 0m 0s 0ms
offsetalignment tmp/17855978298342352080/query tmp/17855978298342352080/search_tmp/17305500037039267211/query_seqs_split targetDB tmp/17855978298342352080/search_tmp/17305500037039267211/target_seqs_split tmp/17855978298342352080/search_tmp/17305500037039267211/aln tmp/17855978298342352080/result --chain-alignments 0 --merge-query 1 --search-type 3 --threads 8 --compressed 0 --db-load-mode 0 -v 3
Computing ORF lookup
Computing contig offsets
Computing contig lookup
Time for contig lookup: 0h 0m 0s 0ms
Writing results to: tmp/17855978298342352080/result
[=================================================================] 1 0s 0ms
Time for merging to result: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 189ms
rmdb tmp/17855978298342352080/search_tmp/17305500037039267211/q_orfs -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/17855978298342352080/search_tmp/17305500037039267211/q_orfs_aa -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/17855978298342352080/search_tmp/17305500037039267211/t_orfs -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/17855978298342352080/search_tmp/17305500037039267211/t_orfs_aa -v 3
Time for processing: 0h 0m 0s 0ms
alnRes.m8 exists and will be overwritten
convertalis tmp/17855978298342352080/query targetDB tmp/17855978298342352080/result alnRes.m8 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --format-mode 0 --format-output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits --translation-table 1 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --db-output 0 --db-load-mode 0 --search-type 3 --threads 8 --compressed 0 -v 3
[=================================================================] 1 0s 0ms
Time for merging to alnRes.m8: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 208ms
rmdb tmp/17855978298342352080/result -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/17855978298342352080/query -v 3
Time for processing: 0h 0m 0s 0ms
rmdb tmp/17855978298342352080/query_h -v 3
Time for processing: 0h 0m 0s 0ms
Hi @pieterprovoost . I had a similar problem, and what helped was --spaced-kmer-mode 0