MMseqs2 icon indicating copy to clipboard operation
MMseqs2 copied to clipboard

All vs All alignment of nucleotides

Open luisas opened this issue 1 year ago • 4 comments

I have a set of nucleotide sequences and I need the pairwise sequence similarity of all vs all.

I understand that one should create a fake_pref and use it to run mmseqs align. Yet, in the documentation I find that the function fake_perf() cannot be used for nucleotides. Is there any way i can use mmseqs align to do an allvsall alignment for nucleotides?

Thanks a lot!

Luisa

luisas avatar Jun 25 '24 12:06 luisas

mmseqs easy-search dna.fas dna.fas res tmp --prefilter-mode 1 --search-type 3 --max-seqs 1000000

Prefilter Mode 1 should be the closest you can currently get with MMseqs2. This will run an exhaustive search with an ungapped prefiltering algorithm and then run SW/ksw2 on the accepted hits from the ungapped alignment.

It's not quite exhaustive SW, but it should be very close.

milot-mirdita avatar Jun 26 '24 08:06 milot-mirdita

Hi,

thanks a lot for the fast reply.

I am interested in having also comparisons of sequences also with low sequence similarity. With the above command they still get filtered out unfortunately. I tried to play around with other command line options but i understand this is not currently possible, is this correct?

Thanks a ton!

luisas avatar Jun 26 '24 10:06 luisas

Nucleotide sequence signal just isn’t as conserved as the protein one, so I don’t think you’ll be able to go much deeper with sequence identity anyway than this procedure would enable.

you can also further lower the min diag score to let more pass though the ungapped prefilter

the better approach would be to do some profile alignment, but mmseqs doesn’t support this for nucleotide yet. So nhmmer might be the way to go currently

milot-mirdita avatar Jun 26 '24 13:06 milot-mirdita

Perfect! This helps a lot, thanks :)

luisas avatar Jun 26 '24 15:06 luisas