MMseqs2 icon indicating copy to clipboard operation
MMseqs2 copied to clipboard

linclust.sh: line 76: Segmentation fault

Open sinamajidian opened this issue 1 year ago • 3 comments

Hi team! We use mmseqs easy-linclust in FastOMA for comparative genomics and orthology inference, recently some users are experiencing Segmentation fault with mmseqs (like here).

Expected Behavior

When I use easy-clust with version mmseqs v14.7e284, it works well and it generates the clusters reported in _cluster.tsv and _all_seqs.fasta.

conda install -c conda-forge -c bioconda mmseqs2=14.7e284
mmseqs easy-linclust --threads 2 singleton_unmapped.fa singleton_unmapped tmp_linclust  > log.out 2>&1

Current Behavior

When I use latest mmseqs2 from conda (16.747c6), I face Segmentation fault.

I also installed from github source (0898eb9), which was sucessfull but I face the same Segmentation fault

Compute score and coverage
Query database size: 61247 type: Aminoacid
Target database size: 61247 type: Aminoacid
Calculation of alignments
[=========================tmp_linclust/758994687944913325/clu_tmp/4818287843092703793/linclust.sh: line 76: 409503 Segmentation fault      (core dumped) $RUNNER "$MMSEQS" "${ALIGN_MODULE}" "$INPUT" "$INPUT" "$RESULTDB" "${TMP_PATH}/aln" ${ALIGNMENT_PAR}
Error: Alignment step died
Error: Search died

Steps to Reproduce (for bugs)

I started each time from new folders.

try1) Failed with version 16.747c6

conda create -n mms python=3.12
conda activate mms
conda install -c conda-forge -c bioconda mmseqs2
mmseqs easy-linclust --threads 2 singleton_unmapped.fa singleton_unmapped tmp_linclust  > log.out 2>&1

try2) Failed with 0898eb9

$ date
Wed Jan 15 15:08:46 CET 2025
conda create -n mms14 python=3.12
conda activate mms14
conda install conda-forge::cmake

git clone [email protected]:soedinglab/MMseqs2.git
cd MMseqs2
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=. ..
make -j8
make install 

 $ ./MMseqs2/build/bin/mmseqs 
...
MMseqs2 Version: 0898eb901272f318bd099a4b7e56d221bbb050cc
mmseqs easy-linclust --threads 2 singleton_unmapped.fa singleton_unmapped tmp_linclust  > log.out 2>&1 

MMseqs Output (for bugs)

successful_14.7e284.tar.gz failed_16.747c6_089eb_.tar.gz

Context

This is the fasta file singleton_unmapped.fa.zip

Your Environment

I'm using university's cluster login node with 48 CPUs.

  • Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters): 0898eb901272f318bd099a4b7e56d221bbb050cc

  • Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.): I used latest conda and also I compiled myself from the latest github.

  • For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation:

$ cmake --version
cmake version 3.31.4
$ g++ --version
g++ (GCC) 11.4.1 20231218 (Red Hat 11.4.1-3)
 $ gcc --version
gcc (GCC) 11.4.1 20231218 (Red Hat 11.4.1-3)
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         43 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  48
  On-line CPU(s) list:   0-47
Vendor ID:               AuthenticAMD
  Model name:            AMD EPYC 7402 24-Core Processor
..
$  grep -o 'avx[^ ]*' /proc/cpuinfo | head -2
avx
avx2
$  grep -o 'sse[^ ]*' /proc/cpuinfo | head -2
sse
sse2

$ cat /proc/meminfo
MemTotal:       527942460 kB
MemFree:         5523840 kB
MemAvailable:   425549692 kB
  • Operating system and version:
$ cat /etc/os-release 
NAME="Red Hat Enterprise Linux"
VERSION="9.4 (Plow)"
ID="rhel"
ID_LIKE="fedora"
..

The previous github issue here is related to this.

sinamajidian avatar Jan 15 '25 14:01 sinamajidian

Thank you for the report. Should be fixed with commit 492297bd7af184ccebbc18c638b8e254484c40f5

The problem was introduced by changes to our GPU database code. Your sequence *RRTVALGFHPTNPLQFP...RKGLNH*TALTLLVP*QFENLFGPCR, which begins with *, triggered the GPU sequence mapping code.

martin-steinegger avatar Jan 18 '25 10:01 martin-steinegger

Great! thanks. I can confirm that this solved the issue. Looking forward to have it on Conda too.

Best, Sina

sinamajidian avatar Jan 18 '25 14:01 sinamajidian

Its on the way, we made a new release https://github.com/soedinglab/MMseqs2/releases/tag/17-b804f

martin-steinegger avatar Jan 18 '25 15:01 martin-steinegger