MMseqs2 Why mmseq2 is much slower than blastn?

Hi, friends, I want to align some nucleotide sequences to my references. So i tested the mmseqs easy-search and traditional blastn using shell command 'time' while i found the running time for mmseqs is much longer than blastn. That's unreasonable. The command i used are listed as follows:

time mmseqs easy-search test.mapped.fasta mmseqs2-nt test.mapped.tsv --alignment-mode 3 --prefilter-mode 1 tmp -s 1 --threads 100 --format-output "query,qheader,qlen,target,theader,tlen,alnlen,pident,fident,nident,qcov,tcov,qseq,tseq,qaln,taln,qstart,qend,tstart,tend,mismatch,evalue"

time blastn -db nt -query test.mapped.fasta -out test.mapped.blastn -evalue 1e-5 -outfmt "6 qseqid qlen sseqid stitle slen qstart qend sstart send qseq sseq length qcovs pident" -num_threads 100 -max_target_seqs 5 -task blastn

for mmseqs:

and for blastn:

Anyone knows the reasons? Thanks!

Feb 07 '25 04:02 544728460

In my opinion, because mmseqs acts like tblastx, it would take 6X computation.

Feb 07 '25 08:02 shenwei356

So why is it so-called 'MMseqs2 can run 10000 times faster than BLAST'? Actually it is slower than BLAST....

Feb 07 '25 09:02 544728460

I think in the readme and paper, BLAST stands for its protein-protein search tool --- blastp.

To be fair, for nucleotide sequence search, you might need to compare mmseqs with tblastx.

If you just want to align some nucleotide sequences to nt dataset, just use blastn or other tools like lexicmap. mmseqs is more sensitive than other tools, with more divergent alignment returned.

Feb 07 '25 10:02 shenwei356

Okay, got it! Thanks! I will search for other tools later.

Feb 07 '25 13:02 544728460

Thanks for the help here, @shenwei356. I believe the search that @544728460 performed was actually on DNA sequences. A few points to consider: MMseqs2 gets faster with more queries since it prebuilds an index. Alternatively, you can pre-generate an index using createindex, but this requires additional disk space and may need more RAM. Additionally, I'm not entirely sure which algorithm runs by default in BLASTN. If it was running in megablast mode, the k-mer size would be much larger than in MMseqs2, which could lead to reduced sensitivity. For a fair comparison, sensitivity parameters should be adjusted accordingly.

Feb 07 '25 14:02 martin-steinegger

Thanks for the help here, @shenwei356. I believe the search that @544728460 performed was actually on DNA sequences. A few points to consider: MMseqs2 gets faster with more queries since it prebuilds an index. Alternatively, you can pre-generate an index using createindex, but this requires additional disk space and may need more RAM. Additionally, I'm not entirely sure which algorithm runs by default in BLASTN. If it was running in megablast mode, the k-mer size would be much larger than in MMseqs2, which could lead to reduced sensitivity. For a fair comparison, sensitivity parameters should be adjusted accordingly.

Yes, i indeed performed mmseqs on DNA sequences and I also created the index files for nt library which served as references to minimize the analysis time. But still, i found mmseqs is slower than blastn.

Feb 07 '25 15:02 544728460

Does the index fit into memory? What is the k-mer length of blastn?

Feb 07 '25 15:02 martin-steinegger

I think the index course can fit into the memory cause no error came out. And i didn't use the megablast mode. I used the parameter "-task blastn". How can i find the specific k-mer length of blastn?

Feb 10 '25 07:02 544728460

By default, Blastn uses megablast mode which uses the word size of 28 bp.
For "-task blastn", the default word size changes to 11.

Feb 10 '25 08:02 shenwei356

So maybe it's not the k-mer length that slows donw the analysis speed??？

Feb 11 '25 08:02 544728460