MMseqs2 Ungapped prefilter died during GPU-accelerated search

When I ran my scrpit below to generate MSA against uniref90 by MMseqs2 GPU-accelerated searching, it reported an error as follows. However, when I replaced the targetDB uniref90 with a smaller one (consisit of about thousands of sequences), such error would not appear. My Linux system contains:

40 CPUs,
400+ GB ram,
4 GPUs,
enough storage.

Could you please help me figure this out? Thanks in advance for your expert help.

Error

ungappedprefilter /a100_nas/ai4s/MSA/queries/testDB /a100_nas/ai4s/MSA/uniref90DB_gpu.idx /a100_nas/ai4s/MSA/tmp/15602816422822286028/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -c 0 -e 0.001 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --min-ungapped-score 15 --max-seqs 300 --db-load-mode 2 --gpu 1 --gpu-server 1 --prefilter-mode 1 --threads 42 --compressed 0 -v 3 

Index version: 16
Generated by:  16.747c6
ScoreMatrix:  VTML80.out
--gpu-server /dev/shm/8478586279687262130 does not existError: Ungapped prefilter died

part of my script

mmseqs createdb $UNIREFFASTA $UNIREFDB
mmseqs makepaddedseqdb $UNIREFDB $UNIREFDB_GPU

QUERY_DB="${QUERY_DIR}/testDB"
mmseqs createdb $QUERY $QUERY_DB

# searching with GPU 
mmseqs createindex $UNIREFDB_GPU $TMP --index-subset 2
mmseqs gpuserver $UNIREFDB_GPU --gpu 1 &
PID=$!
mmseqs search $QUERY_DB $UNIREFDB_GPU $RESULTS/aln_test $TMP --gpu 1 --gpu-server 1 --db-load-mode 2 --remove-tmp-files 1 --max-seq 10000
kill $PID

Dec 11 '24 06:12 TheChosenOneJG

hello, it seems i have the same problem with you.I wanna know that if you used the command mmseqs makepaddedseqdb for the targetdb and it returns the segment error?

Dec 11 '24 07:12 unknow1024

hello, it seems i have the same problem with you.I wanna know that if you used the command mmseqs makepaddedseqdb for the targetdb and it returns the segment error?

Hello, I definitely used mmseqs makepaddedseqdb command, and it succeeded and no error occurred. How large is your targetDB?

Dec 11 '24 08:12 TheChosenOneJG

Nearly 43GB.It is uniclust30-hhsuite and used mmseqs createdb to get the targetdb.

Dec 11 '24 08:12 unknow1024

Sorry I missed this issue. I recommend adding a sleep command after the gpuserver start to make sure its actually ready.

This is definitely something to still improve for the future

Jan 16 '25 18:01 milot-mirdita

Sorry I missed this issue. I recommend adding a sleep command after the gpuserver start to make sure its actually ready.

This is definitely something to still improve for the future

Thank you for your reply! It helps a lot!

Jan 17 '25 09:01 TheChosenOneJG

Hi, I'm experiencing the same issue when trying to start a GPU server with a targetDB_pad. A sleep command after starting the GPU server doesn't resolve the issue for me (tried sleep of up to 120).

I'm using the latest docker container of mmseqs2-cuda12 and running it on an NVIDIA A40 (CUDA v12.2) with 900GB RAM plus 48 CPUs:

#!/bin/bash
#SBATCH -D ./
#SBATCH -J docker_mmseqs
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=48
#SBATCH --gres=gpu:A40:1
#SBATCH --mem=900000
#SBATCH --time=1:00:00
## other sbatch stuff...

CACHE="/home/path/to/singularity/cache"
SIF="/home/path/to/container/mmseqs2_master-cuda12.sif"

# make GPU DB from CPU DB
singularity run --nv \
    -B $CACHE:/cache \
    -B $(pwd):/work \
    $SIF makepaddedseqdb \
    /work/db/targetDB \
    /work/db/targetDB_pad

# start GPU server
singularity run --nv \
    -B $CACHE:/cache \
    -B $(pwd):/work \
    $SIF gpuserver \
    /work/db/targetDB_pad --max-seqs 10000 --db-load-mode 0 --prefilter-mode 1 &
PID1=$!

sleep 120

# run mmseqs
singularity run --nv \
    -B $CACHE:/cache \
    -B $(pwd):/work \
    $SIF easy-search \
    /work/input.fasta \
    /work/db/targetDB_pad \
    /work/output.m8 \
    /work/tmp \
    --gpu 1 \
    --gpu-server 1 \
    --remove-tmp-files 1

The stdout:

/work/db/targetDB_pad exists and will be overwritten
makepaddedseqdb /work/db/targetDB /work/db/targetDB_pad

MMseqs Version:                         eaecacf4ba24e9c8a0f2a1da115603ebc80710ad
Substitution matrix                     aa:blosum62.out,nucl:nucleotide.out
Score bias                              0
Mask residues                           1
Mask residues probability               0.9
Mask lower case residues                0
Mask lower letter repeating N times     0
Write lookup file                       1
Threads                                 64
Verbosity                               3

[=================================================================] 3.26K 0s 15ms
Time for merging to hydDB1_pad: 0h 0m 0s 190ms
Time for merging to hydDB1_pad_h: 0h 0m 0s 152ms
Time for processing: 0h 0m 1s 216ms
gpuserver /work/db/targetDB_pad --max-seqs 10000 --db-load-mode 0 --prefilter-mode 1

MMseqs Version:         eaecacf4ba24e9c8a0f2a1da115603ebc80710ad
Use GPU                 0
Max results per query   10000
Preload mode            1
Prefilter mode          1

374968733484103649
easy-search /work/input.fasta /work/db/targetDB_pad /work/output.m8 /work/tmp --gpu 1 --gpu-server 1 --remove-tmp-files 1

MMseqs Version:                         eaecacf4ba24e9c8a0f2a1da115603ebc80710ad
Substitution matrix                     aa:blosum62.out,nucl:nucleotide.out
Add backtrace                           false
Alignment mode                          3
Alignment mode                          0
Allow wrapped scoring                   false
E-value threshold                       0.001
Seq. id. threshold                      0
Min alignment length                    0
Seq. id. mode                           0
Alternative alignments                  0
Coverage threshold                      0
Coverage mode                           0
Max sequence length                     65535
Compositional bias                      1
Compositional bias scale                1
Max reject                              2147483647
Max accept                              2147483647
Include identical seq. id.              false
Preload mode                            0
Pseudo count a                          substitution:1.100,context:1.400
Pseudo count b                          substitution:4.100,context:5.800
Score bias                              0
Realign hits                            false
Realign score bias                      -0.2
Realign max seqs                        2147483647
Correlation score weight                0
Gap open cost                           aa:11,nucl:5
Gap extension cost                      aa:1,nucl:2
Zdrop                                   40
Threads                                 64
Compressed                              0
Verbosity                               3
Seed substitution matrix                aa:VTML80.out,nucl:nucleotide.out
Sensitivity                             5.7
k-mer length                            0
Target search mode                      0
k-score                                 seq:2147483647,prof:2147483647
Alphabet size                           aa:21,nucl:5
Max results per query                   300
Split database                          0
Split mode                              2
Split memory limit                      0
Diagonal scoring                        true
Exact k-mer matching                    0
Mask residues                           1
Mask residues probability               0.9
Mask lower case residues                0
Mask lower letter repeating N times     0
Minimum diagonal score                  15
Selected taxa
Spaced k-mers                           1
Spaced k-mer pattern
Local temporary path
Use GPU                                 1
Use GPU server                          1
Wait for GPU server                     600
Prefilter mode                          0
Rescore mode                            0
Remove hits by seq. id. and coverage    false
Sort results                            0
Mask profile                            1
Profile E-value threshold               0.001
Global sequence weighting               false
Allow deletions                         false
Filter MSA                              1
Use filter only at N seqs               0
Maximum seq. id. threshold              0.9
Minimum seq. id.                        0.0
Minimum score per column                -20
Minimum coverage                        0
Select N most diverse seqs              1000
Pseudo count mode                       0
Profile output mode                     0
Min codons in orf                       30
Max codons in length                    32734
Max orf gaps                            2147483647
Contig start mode                       2
Contig end mode                         2
Orf start mode                          1
Forward frames                          1,2,3
Reverse frames                          1,2,3
Translation table                       1
Translate orf                           0
Use all table starts                    false
Offset of numeric ids                   0
Create lookup                           0
Overlap between sequences               0
Sequence split mode                     1
Header split mode                       0
Chain overlapping alignments            0
Merge query                             1
Search type                             0
Search iterations                       1
Start sensitivity                       4
Search steps                            1
Exhaustive search mode                  false
Filter results during exhaustive search 0
Strand selection                        1
LCA search mode                         false
Disk space limit                        0
MPI runner
Force restart with latest tmp           false
Remove temporary files                  true
Translation mode                        0
Alignment format                        0
Format alignment output                 query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits
Database output                         false
Overlap threshold                       0
Database type                           0
Shuffle input database                  true
Createdb mode                           0
Write lookup file                       0
Greedy best hits                        false

search /work/tmp/995132545111804393/query /work/db/targetDB_pad /work/tmp/995132545111804393/result /work/tmp/995132545111804393/search_tmp --alignment-mode 3 -s 5.7 --gpu 1 --gpu-server 1 --remove-tmp-files 1

Error: Ungapped prefilter died
Error: Search died

And the stderror:

INFO:    underlay of /usr/bin/nvidia-smi required more than 50 (259) bind mounts
INFO:    underlay of /usr/bin/nvidia-smi required more than 50 (259) bind mounts
INFO:    underlay of /usr/bin/nvidia-smi required more than 50 (259) bind mounts
malloc(): corrupted top size
Aborted (core dumped)

If I skip starting a GPU server for the targetDB_pad, it works normally. So for now I'll just skip the gpuserver step, but was wondering if there is any way to resolve this issue. Thanks

Mar 06 '25 23:03 jlingford

What query/target set is this? the one in example/?

Mar 07 '25 07:03 milot-mirdita

No, this is one I built with mmseqs createdb and contains 3261 sequences of roughly 400-900aa length. The query fasta file is also my own:

>NuoD
MTEKYAPPIPETSDYAISVGPQHPTHKEPVRFIFQVKGETVQDVDLRIGFNHRGIEKAFENRTWLKNLYLVTRLCGICSVAHQLAYVHAAEKCMIIQDSVPERAHFIRLIIAELERVQSHILWYGVLAHDTGYDTLFHITWRDREIVNDILELISGNRVNYAMYTLGGVRRDISREQKEKIVPKLKDLRKKCEYHRAVMMKERSFIVRQKGVAILSKKDAKKYCAVGPTVRASGVNIDLRKVDPYSVYDKVSFDVPLYSEGDILGGLYNRLDETLISIDIILDALDAMPAGDIRLPWREVPRRPETSEGIQRVEAPRGEDIHYIRSNGTDKPDRHKIRAPTFQNFPSLVHRLKGVQVADIPPVIRVIDPCIGCCERVTFVKAGSRKKLTLNGHHLVSRANRFYRSGTKVLDF

Mar 07 '25 08:03 jlingford