MMseqs2 icon indicating copy to clipboard operation
MMseqs2 copied to clipboard

update mmseqs database with non-mmseqs database

Open mmpust opened this issue 3 years ago • 0 comments

Hey, thank you so providing and maintaining mmseq2. I have the following workflow question. Let's assume I have

  • A 75 GB nucleotide database (x.fna) clustered (95% threshold) with a different method.
mmseqs createdb x.fna x_db
  • A 52 GB nucleotide database (y.fna) that was clustered with linclust (95 % threshold).
mmseqs createdb y.fna y_db
mmseqs linclust y_db y_clust temp/ -c 0.95 --min-seq-id 0.95 --cov-mode 1

I want to combine databases X and Y without deletion:

mmseqs concatdbs y_db x_db mergedDB
mmseqs concatdbs y_h x_db_h mergedDB_h
mmseqs clusterupdate y_db mergedDB y_clust merged_seq merged_clust tmp --search-type 3 --min-seq-id 0.95 -c 0.8

Can I combine the databases even though the first one was not clustered with linclust? Is the proposed workflow correct or would you recommend to merge the FASTA files, generate a mmseq2 database and re-cluster completely by applzing the linclust algorithm to the combined version? Many thanks!

mmpust avatar Jul 04 '22 18:07 mmpust