Why mmseq2 is much slower than blastn?
Hi, friends, I want to align some nucleotide sequences to my references. So i tested the mmseqs easy-search and traditional blastn using shell command 'time' while i found the running time for mmseqs is much longer than blastn. That's unreasonable. The command i used are listed as follows:
time mmseqs easy-search test.mapped.fasta mmseqs2-nt test.mapped.tsv --alignment-mode 3 --prefilter-mode 1 tmp -s 1 --threads 100 --format-output "query,qheader,qlen,target,theader,tlen,alnlen,pident,fident,nident,qcov,tcov,qseq,tseq,qaln,taln,qstart,qend,tstart,tend,mismatch,evalue"
time blastn -db nt -query test.mapped.fasta -out test.mapped.blastn -evalue 1e-5 -outfmt "6 qseqid qlen sseqid stitle slen qstart qend sstart send qseq sseq length qcovs pident" -num_threads 100 -max_target_seqs 5 -task blastn
for mmseqs:
and for blastn:
Anyone knows the reasons? Thanks!
In my opinion, because mmseqs acts like tblastx, it would take 6X computation.
So why is it so-called 'MMseqs2 can run 10000 times faster than BLAST'? Actually it is slower than BLAST....
I think in the readme and paper, BLAST stands for its protein-protein search tool --- blastp.
To be fair, for nucleotide sequence search, you might need to compare mmseqs with tblastx.
If you just want to align some nucleotide sequences to nt dataset, just use blastn or other tools like lexicmap. mmseqs is more sensitive than other tools, with more divergent alignment returned.
Okay, got it! Thanks! I will search for other tools later.
Thanks for the help here, @shenwei356. I believe the search that @544728460 performed was actually on DNA sequences. A few points to consider:
MMseqs2 gets faster with more queries since it prebuilds an index. Alternatively, you can pre-generate an index using createindex, but this requires additional disk space and may need more RAM.
Additionally, I'm not entirely sure which algorithm runs by default in BLASTN. If it was running in megablast mode, the k-mer size would be much larger than in MMseqs2, which could lead to reduced sensitivity. For a fair comparison, sensitivity parameters should be adjusted accordingly.
Thanks for the help here, @shenwei356. I believe the search that @544728460 performed was actually on DNA sequences. A few points to consider: MMseqs2 gets faster with more queries since it prebuilds an index. Alternatively, you can pre-generate an index using
createindex, but this requires additional disk space and may need more RAM. Additionally, I'm not entirely sure which algorithm runs by default in BLASTN. If it was running in megablast mode, the k-mer size would be much larger than in MMseqs2, which could lead to reduced sensitivity. For a fair comparison, sensitivity parameters should be adjusted accordingly.
Yes, i indeed performed mmseqs on DNA sequences and I also created the index files for nt library which served as references to minimize the analysis time. But still, i found mmseqs is slower than blastn.
Does the index fit into memory? What is the k-mer length of blastn?
I think the index course can fit into the memory cause no error came out. And i didn't use the megablast mode. I used the parameter "-task blastn". How can i find the specific k-mer length of blastn?
- By default, Blastn uses megablast mode which uses the word size of 28 bp.
- For "-task blastn", the default word size changes to 11.
So maybe it's not the k-mer length that slows donw the analysis speed???