MMseqs2 icon indicating copy to clipboard operation
MMseqs2 copied to clipboard

easy-linclust not work with Ungapped alignment step died

Open jackhu3301 opened this issue 1 year ago • 1 comments

Expected Behavior

I want use easy-linclust to cluster protein seqs.

Current Behavior

mmseqs easy-linclust all_seq.fasta clusterRes tmp --cov-mode 0 --min-seq-id 0.4

MMseqs Output (for bugs)

Create directory tmp easy-linclust all_seq.fasta clusterRes tmp --cov-mode 1 --min-seq-id 0.4

MMseqs Version: a14688744081c15439fa3092eec9dfd8be40440b Cluster mode 0 Max connected component depth 1000 Similarity type 2 Threads 64 Compressed 0 Verbosity 3 Weight file name Cluster Weight threshold 0.9 Substitution matrix aa:blosum62.out,nucl:nucleotide.out Add backtrace false Alignment mode 0 Alignment mode 0 Allow wrapped scoring false E-value threshold 0.001 Seq. id. threshold 0.4 Min alignment length 0 Seq. id. mode 0 Alternative alignments 0 Coverage threshold 0.8 Coverage mode 1 Max sequence length 65535 Compositional bias 1 Compositional bias 1 Max reject 2147483647 Max accept 2147483647 Include identical seq. id. false Preload mode 0 Pseudo count a substitution:1.100,context:1.400 Pseudo count b substitution:4.100,context:5.800 Score bias 0 Realign hits false Realign score bias -0.2 Realign max seqs 2147483647 Correlation score weight 0 Gap open cost aa:11,nucl:5 Gap extension cost aa:1,nucl:2 Zdrop 40 Alphabet size aa:21,nucl:5 k-mers per sequence 21 Spaced k-mers 0 Spaced k-mer pattern Scale k-mers per sequence aa:0.000,nucl:0.200 Adjust k-mer length false Mask residues 1 Mask residues probability 0.9 Mask lower case residues 0 k-mer length 0 Shift hash 67 Split memory limit 0 Include only extendable false Skip repeating k-mers false Rescore mode 0 Remove hits by seq. id. and coverage false Sort results 0 Remove temporary files true Force restart with latest tmp false MPI runner Database type 0 Shuffle input database true Createdb mode 1 Write lookup file 0 Offset of numeric ids 0

createdb all_seq.fasta tmp/8115150149931881526/input --dbtype 0 --shuffle 1 --createdb-mode 1 --write-lookup 0 --id-offset 0 --compressed 0 -v 3

Shuffle database cannot be combined with --createdb-mode 0 We recompute with --shuffle 0 Converting sequences [Multiline fasta can not be combined with --createdb-mode 0 We recompute with --createdb-mode 1 Time for merging to input_h: 0h 0m 0s 3ms Time for merging to input: 0h 0m 0s 3ms [======= Time for merging to input_h: 0h 0m 0s 2ms Time for merging to input: 0h 0m 0s 2ms Database type: Aminoacid Time for processing: 0h 0m 0s 102ms Create directory tmp/8115150149931881526/clu_tmp linclust tmp/8115150149931881526/input tmp/8115150149931881526/clu tmp/8115150149931881526/clu_tmp -e 0.001 --min-seq-id 0.4 -c 0.8 --cov-mode 1 --spaced-kmer-mode 0 --remove-tmp-files 1

Set cluster mode GREEDY MEM. kmermatcher tmp/8115150149931881526/input tmp/8115150149931881526/clu_tmp/13790714163985984779/pref --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --alph-size aa:13,nucl:5 --min-seq-id 0.4 --kmer-per-seq 21 --spaced-kmer-mode 0 --kmer-per-seq-scale aa:0.000,nucl:0.200 --adjust-kmer-len 0 --mask 0 --mask-prob 0.9 --mask-lower-case 0 --cov-mode 1 -k 0 -c 0.8 --max-seq-len 65535 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 64 --compressed 0 -v 3 --cluster-weight-threshold 0.9

kmermatcher tmp/8115150149931881526/input tmp/8115150149931881526/clu_tmp/13790714163985984779/pref --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --alph-size aa:13,nucl:5 --min-seq-id 0.4 --kmer-per-seq 21 --spaced-kmer-mode 0 --kmer-per-seq-scale aa:0.000,nucl:0.200 --adjust-kmer-len 0 --mask 0 --mask-prob 0.9 --mask-lower-case 0 --cov-mode 1 -k 0 -c 0.8 --max-seq-len 65535 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 64 --compressed 0 -v 3 --cluster-weight-threshold 0.9

Database size: 77298 type: Aminoacid Reduced amino acid alphabet: (A S T) (C) (D B N) (E Q Z) (F Y) (G) (H) (I V) (K R) (L J M) (P) (W) (X)

Generate k-mers list for 1 split [=================================================================] 77.30K 0s 41ms Sort kmer 0h 0m 0s 46ms Sort by rep. sequence 0h 0m 0s 22ms Time for fill: 0h 0m 0s 11ms Time for merging to pref: 0h 0m 0s 2ms Time for processing: 0h 0m 0s 225ms rescorediagonal tmp/8115150149931881526/input tmp/8115150149931881526/input tmp/8115150149931881526/clu_tmp/13790714163985984779/pref tmp/8115150149931881526/clu_tmp/13790714163985984779/pref_rescore1 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --rescore-mode 0 --wrapped-scoring 0 --filter-hits 0 -e 0.001 -c 0.8 -a 0 --cov-mode 1 --min-seq-id 0.5 --min-aln-len 0 --seq-id-mode 0 --add-self-matches 0 --sort-results 0 --db-load-mode 0 --threads 64 --compressed 0 -v 3

[=================================================================] 77.30K 0s 71ms Time for merging to pref_rescore1: 0h 0m 0s 102ms Time for processing: 0h 0m 0s 429ms clust tmp/8115150149931881526/input tmp/8115150149931881526/clu_tmp/13790714163985984779/pref_rescore1 tmp/8115150149931881526/clu_tmp/13790714163985984779/pre_clust --cluster-mode 3 --max-iterations 1000 --similarity-type 2 --threads 64 --compressed 0 -v 3 --cluster-weight-threshold 0.9

Clustering mode: Greedy Low Mem Total time: 0h 0m 0s 91ms

Size of the sequence database: 77298 Size of the alignment database: 77298 Number of clusters: 31445

Writing results 0h 0m 0s 3ms Time for merging to pre_clust: 0h 0m 0s 2ms Time for processing: 0h 0m 0s 188ms createsubdb tmp/8115150149931881526/clu_tmp/13790714163985984779/order_redundancy tmp/8115150149931881526/input tmp/8115150149931881526/clu_tmp/13790714163985984779/input_step_redundancy -v 3 --subdb-mode 1

Time for merging to input_step_redundancy: 0h 0m 0s 2ms Time for processing: 0h 0m 0s 22ms createsubdb tmp/8115150149931881526/clu_tmp/13790714163985984779/order_redundancy tmp/8115150149931881526/clu_tmp/13790714163985984779/pref tmp/8115150149931881526/clu_tmp/13790714163985984779/pref_filter1 -v 3 --subdb-mode 1

Time for merging to pref_filter1: 0h 0m 0s 2ms Time for processing: 0h 0m 0s 23ms filterdb tmp/8115150149931881526/clu_tmp/13790714163985984779/pref_filter1 tmp/8115150149931881526/clu_tmp/13790714163985984779/pref_filter2 --filter-file tmp/8115150149931881526/clu_tmp/13790714163985984779/order_redundancy --threads 64 --compressed 0 -v 3

Filtering using file(s) [=================================================================] 31.44K 0s 20ms Time for merging to pref_filter2: 0h 0m 0s 88ms Time for processing: 0h 0m 0s 360ms rescorediagonal tmp/8115150149931881526/clu_tmp/13790714163985984779/input_step_redundancy tmp/8115150149931881526/clu_tmp/13790714163985984779/input_step_redundancy tmp/8115150149931881526/clu_tmp/13790714163985984779/pref_filter2 tmp/8115150149931881526/clu_tmp/13790714163985984779/pref_rescore2 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --rescore-mode 1 --wrapped-scoring 0 --filter-hits 1 -e 0.001 -c 0.8 -a 0 --cov-mode 1 --min-seq-id 0.4 --min-aln-len 0 --seq-id-mode 0 --add-self-matches 0 --sort-results 0 --db-load-mode 0 --threads 64 --compressed 0 -v 3

[=========Error: Ungapped alignment step died Error: Search died

Your Environment

Include as many relevant details about the environment you experienced the bug in.

  • Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters): a14688744081c15439fa3092eec9dfd8be40440b
  • Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.): Source install from github
  • For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation: GNU Make 4.1
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory): SSE4
  • Operating system and version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)

jackhu3301 avatar Jul 24 '24 09:07 jackhu3301

I have this problem on my wsl ubuntu same as this.

Error: Search died

And I find the reason is that there is invaild letters in my sequences(sequence in fasta file). After delete those sequences, easy-linclust can run smoothly. I hope this can help.

Aldrich-ux avatar Jul 18 '25 06:07 Aldrich-ux