easy-taxonomy error when using GTDB
Thanks for maintaining this great software! I'm having an issue with easy-taxonomy using GTDB (but not NCBI with the same query input), described below. Using a conda install of version 15.6f452.
Thanks for any help!
Expected Behavior
Completing without error
Current Behavior
Fails at aggregatetaxweights with the following:
Missing key 0 in tax result ] 0.00% 1 eta -
Error: aggregatetaxweights died
Error: Search died
Full log here: easy-tax-full-log-error.txt
Steps to Reproduce (for bugs)
Install
mamba create -y -n mmseqs2 -c conda-forge -c bioconda -c defaults mmseqs2==15.6f452
conda activate mmseqs2
DB setup
mmseqs databases GTDB mmseqs2-GTDB-db tmp
Making small test data
wget -O e-coli.fasta.gz https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/005/845/GCA_000005845.2_ASM584v2/GCA_000005845.2_ASM584v2_genomic.fna.gz
gunzip e-coli.fasta.gz
grep -c ">" e-coli.fasta
# there is only one contig, so safe to just pull some lines
printf ">contig_1\n" > contigs.fasta
sed -n '100,1200p' e-coli.fasta >> contigs.fasta
printf ">contig_2\n" >> contigs.fasta
sed -n '20000,20600p' e-coli.fasta >> contigs.fasta
printf ">contig_3\n" >> contigs.fasta
sed -n '26000,26200p' e-coli.fasta >> contigs.fasta
# that's 3 contigs: 88,000 bps; 48,000 bps; and 16,000 bps
Running the program
mmseqs easy-taxonomy contigs.fasta mmseqs2-GTDB-db GTDB-tax-result tax-tmp \
--threads 20 --tax-lineage 1 --compressed 1 --remove-tmp-files 0
MMseqs Output (for bugs)
Fails at aggregatetaxweights with the following:
Missing key 0 in tax result ] 0.00% 1 eta -
Error: aggregatetaxweights died
Error: Search died
Full log here: easy-tax-full-log-error.txt
Context
Trying to get taxonomy output via GTDB with lineage info added. Using the NCBI database completes successfully on the same input query.
Your Environment
Include as many relevant details about the environment you experienced the bug in.
- mmseqs2 version 15.6f452 installed with conda
- working on Ubuntu 20.04.4 LTS
- 500 GB memory
I am having a very similar error:
Current behaviour
After submitting a mmseqs taxonomy run, this sub command is being executed (and dies):
aggregatetaxweights mmseqs_database/database tmp1/14824571404584235274/orfs_h_swapped tmp1/14824571404584235274/orfs_tax tmp1/14824571404584235274/orfs_tax_aln SWH_IN_taxonomy/SWH_IN --lca-ranks kingdom,phylum,class,order,family,genus,species --tax-lineage 1 --compressed 1 --threads 12 -v 3
MMseqs output
Missing key 0 in tax result
tmp1/14824571404584235274/taxpercontig.sh: line 85: 206297 Aborted (core dumped) "$MMSEQS" aggregatetaxweights "${TAX_SEQ_DB}" "${TMP_PATH}/orfs_h_swapped" "${TMP_PATH}/orfs_tax" "${TMP_PATH}/orfs_tax_aln" "${RESULTS}" ${AGGREGATETAX_PAR}
Error: aggregatetaxweights died
Environment
- singularity container of mmseqs2 version 15.6f452 (build pl5321h6a68c12_0)
- HPC (Linux + slurm)
- 950 GB RAM
Comment I know that for mmseqs taxonomy classification with GTDB at least 900 GB RAM are needed, so I am not surprised that your process died @AstrobioMike. And since I seem to have a very similar error (if not the same) maybe even my 950 GB RAM are not enough, I wonder...