Not enough GFF files are provided. Some results might be omitted Error: gff2db failed
Expected Behavior
No error.
Current Behavior
Not enough GFF files are provided. Some results might be omitted
tmpFolder/15806226240676100088/createsetdb.sh: line 130: 9468 Segmentation fault "${MMSEQS}" gff2db "${@}" "${TMP_PATH}/seqDB" "${OUTDB}_nucl" ${GFF2DB_PAR}
Error: gff2db failed
Steps to Reproduce (for bugs)
I downloaded 66 genomes (assemblies) from NCBI in fasta format:
conda activate ncbi_datasets
datasets download genome accession GCA_019927265.1 GCA_019927285.1 GCA_019927305.1 GCA_019927315.1 GCA_019927325.1 GCA_019927345.1 GCA_019927385.1 GCA_019927405.1 GCA_019927425.1 GCA_019927435.1 GCA_019927465.1 GCA_019927485.1GCA_019927505.1 GCA_019927525.1 GCA_019927535.1 GCA_019927565.1 GCA_019927585.1 GCA_019927605.1 GCA_019927625.1 GCA_019927645.1GCA_019927655.1 GCA_019927685.1 GCA_019927705.1 GCA_019927725.1GCA_019927745.1 GCA_019927765.1 GCA_019927785.1 GCA_019927805.1GCA_019927825.1 GCA_019927845.1 GCA_019927855.1 GCA_019927885.1GCA_019927905.1 GCA_019927925.1 GCA_019927945.1 GCA_019927965.1GCA_019927985.1 GCA_019928005.1 GCA_019928025.1 GCA_019928045.1GCA_019928065.1 GCA_019928075.1 GCA_019928105.1 GCA_019928115.1GCA_019928145.1 GCA_019928165.1 GCA_019928185.1 GCA_019928205.1GCA_019928215.1 GCA_019928245.1 GCA_019928265.1 GCA_019928285.1GCA_019928305.1 GCA_019928325.1 GCA_019928345.1 GCA_019928365.1GCA_019928385.1 GCA_019928395.1 GCA_019928405.1 GCA_019928445.1GCA_019928465.1 GCA_019928485.1 GCA_019928505.1 GCA_019928525.1GCA_019928545.1 GCA_019928565.1
For each one of them I ran:
prodigal -i ASM1992728v1.fna -o ASM1992728v1.gff -a ASM1992728v1.faa -f gff
and created the gffDir.txt
find "$(pwd)" -name "*.gff" > gffDir.txt
then I ran:
spacedust createsetdb ./*.fna setDB tmpFolder --gff-dir gffDir.txt --gff-type CDS
and get the error:
Not enough GFF files are provided. Some results might be omitted
tmpFolder/15806226240676100088/createsetdb.sh: line 130: 9468 Segmentation fault "${MMSEQS}" gff2db "${@}" "${TMP_PATH}/seqDB" "${OUTDB}_nucl" ${GFF2DB_PAR}
Error: gff2db failed
Hi! Sorry for the delay.
I have found the bug in gff2db, but it might take a bit of time to get it fixed, because this module is under MMseqs2.
I will post it here when there is a updated fix.
Alternatively, you could directly give the prodigal .faa files as the input.
Hi @RuoshiZhang, I replicate the same error with NCBI gff3 and progidal produced gff3 files. Is there any update on this?
I get the same error with the .fna and .gff3 files from the JAVZHU010000001 entry on NCBI:
https://www.ncbi.nlm.nih.gov/nuccore/JAVZHU010000001.1
I'm encountering the same error when running spacedust createsetdb with a genome .fna file and its corresponding .gff annotation downloaded from NCBI.
bin/spacedust: Argument list too long Error: gff2db failed
Hi, is there any update on this bug? I'm getting a similar error message with the example .fna and .gff files provided in this repo: "Not enough GFF files are provided. Some results might be omitted".
I am using the latest spacedust, foldseek, and mmseqs binaries, by the way.
E.g.:
spacedust createsetdb examples/uvig_120081.fna examples/uvig_255655.fna setDB/example_trial tmp --gff-dir examples/gff.txt --gff-type CDS
createsetdb examples/uvig_120081.fna examples/uvig_255655.fna setDB/example_trial tmp --gff-dir examples/gff.txt --gff-type CDS
MMseqs Version: 5358214da8764737aa01af485b682729bb8d3ace
Database type 0
Shuffle input database false
Createdb mode 0
Write lookup file 1
Offset of numeric ids 0
Compressed 0
Verbosity 3
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Create lookup 0
Threads 12
Add orf stop false
GFF type CDS
Statistics to be computed
Tsv false
File Inclusion Regex .*
File Exclusion Regex ^$
gff dir file examples/gff.txt
createdb examples/uvig_120081.fna examples/uvig_255655.fna tmp/15431538959630537525/seqDB --dbtype 0 --shuffle 0 --createdb-mode 0 --write-lookup 1 --id-offset 0 --compressed 0 -v 3
Converting sequences
Time for merging to seqDB_h: 0h 0m 0s 0ms
Time for merging to seqDB: 0h 0m 0s 0ms
Database type: Nucleotide
Time for processing: 0h 0m 0s 3ms
Input DB type is Nucleotide.
gff2db examples/uvig_120081.gff tmp/15431538959630537525/seqDB /home/james/Documents/biotools/spacedust/setDB/example_trial_nucl --gff-type CDS --id-offset 0 --threads 12 -v 3
Not enough GFF files are provided. Some results might be omitted
[=================================================================] 100.00% 1 eta -
Time for merging to example_trial_nucl.lookup: 0h 0m 0s 1ms
Time for merging to example_trial_nucl_h: 0h 0m 0s 0ms
Time for merging to example_trial_nucl: 0h 0m 0s 1ms
Found these feature types and counts:
- CDS: 53
Time for processing: 0h 0m 0s 27ms
translatenucs /home/james/Documents/biotools/spacedust/setDB/example_trial_nucl /home/james/Documents/biotools/spacedust/setDB/example_trial --translation-table 1 --add-orf-stop 0 -v 3 --compressed 0 --threads 12
[=================================================================] 100.00% 53 0s 1ms
Time for merging to example_trial: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 15ms
tsv2db /home/james/Documents/biotools/spacedust/setDB/example_trial_member_to_set.tsv /home/james/Documents/biotools/spacedust/setDB/example_trial_member_to_set --output-dbtype 5
Output database type: Alignment
Time for merging to example_trial_member_to_set: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 0ms
tsv2db /home/james/Documents/biotools/spacedust/setDB/example_trial_set_to_member.tsv /home/james/Documents/biotools/spacedust/setDB/example_trial_set_to_member --output-dbtype 5
Output database type: Alignment
Time for merging to example_trial_set_to_member: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 0ms
result2stats /home/james/Documents/biotools/spacedust/setDB/example_trial /home/james/Documents/biotools/spacedust/setDB/example_trial /home/james/Documents/biotools/spacedust/setDB/example_trial_set_to_member /home/james/Documents/biotools/spacedust/setDB/example_trial_set_size --stat linecount --tsv 0 --compressed 0 --threads 12 -v 3
[=================================================================] 100.00% 1 eta -
Time for merging to example_trial_set_size: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 2ms
Although, the run doesn't fail like in other users' examples.