spacedust icon indicating copy to clipboard operation
spacedust copied to clipboard

Not enough GFF files are provided. Some results might be omitted Error: gff2db failed

Open abrozzi opened this issue 1 year ago • 4 comments

Expected Behavior

No error.

Current Behavior

Not enough GFF files are provided. Some results might be omitted
tmpFolder/15806226240676100088/createsetdb.sh: line 130:  9468 Segmentation fault      "${MMSEQS}" gff2db "${@}" "${TMP_PATH}/seqDB" "${OUTDB}_nucl" ${GFF2DB_PAR}
Error: gff2db failed

Steps to Reproduce (for bugs)

I downloaded 66 genomes (assemblies) from NCBI in fasta format:

conda activate ncbi_datasets
datasets download genome accession GCA_019927265.1 GCA_019927285.1 GCA_019927305.1 GCA_019927315.1  GCA_019927325.1 GCA_019927345.1 GCA_019927385.1 GCA_019927405.1  GCA_019927425.1 GCA_019927435.1 GCA_019927465.1 GCA_019927485.1GCA_019927505.1 GCA_019927525.1 GCA_019927535.1 GCA_019927565.1 GCA_019927585.1 GCA_019927605.1 GCA_019927625.1 GCA_019927645.1GCA_019927655.1 GCA_019927685.1 GCA_019927705.1 GCA_019927725.1GCA_019927745.1 GCA_019927765.1 GCA_019927785.1 GCA_019927805.1GCA_019927825.1 GCA_019927845.1 GCA_019927855.1 GCA_019927885.1GCA_019927905.1 GCA_019927925.1 GCA_019927945.1 GCA_019927965.1GCA_019927985.1 GCA_019928005.1 GCA_019928025.1 GCA_019928045.1GCA_019928065.1 GCA_019928075.1 GCA_019928105.1 GCA_019928115.1GCA_019928145.1 GCA_019928165.1 GCA_019928185.1 GCA_019928205.1GCA_019928215.1 GCA_019928245.1 GCA_019928265.1 GCA_019928285.1GCA_019928305.1 GCA_019928325.1 GCA_019928345.1 GCA_019928365.1GCA_019928385.1 GCA_019928395.1 GCA_019928405.1 GCA_019928445.1GCA_019928465.1 GCA_019928485.1 GCA_019928505.1 GCA_019928525.1GCA_019928545.1 GCA_019928565.1

For each one of them I ran:

prodigal -i ASM1992728v1.fna -o ASM1992728v1.gff -a ASM1992728v1.faa -f gff

and created the gffDir.txt

find "$(pwd)" -name "*.gff" > gffDir.txt

then I ran:

spacedust createsetdb ./*.fna setDB tmpFolder --gff-dir gffDir.txt --gff-type CDS

and get the error:

Not enough GFF files are provided. Some results might be omitted
tmpFolder/15806226240676100088/createsetdb.sh: line 130:  9468 Segmentation fault      "${MMSEQS}" gff2db "${@}" "${TMP_PATH}/seqDB" "${OUTDB}_nucl" ${GFF2DB_PAR}
Error: gff2db failed

abrozzi avatar Oct 11 '24 10:10 abrozzi

Hi! Sorry for the delay. I have found the bug in gff2db, but it might take a bit of time to get it fixed, because this module is under MMseqs2. I will post it here when there is a updated fix. Alternatively, you could directly give the prodigal .faa files as the input.

RuoshiZhang avatar Oct 23 '24 14:10 RuoshiZhang

Hi @RuoshiZhang, I replicate the same error with NCBI gff3 and progidal produced gff3 files. Is there any update on this?

resulelginembl avatar Jun 02 '25 09:06 resulelginembl

I get the same error with the .fna and .gff3 files from the JAVZHU010000001 entry on NCBI: https://www.ncbi.nlm.nih.gov/nuccore/JAVZHU010000001.1

SamuelSchwab avatar Jul 11 '25 14:07 SamuelSchwab

I'm encountering the same error when running spacedust createsetdb with a genome .fna file and its corresponding .gff annotation downloaded from NCBI. bin/spacedust: Argument list too long Error: gff2db failed

JinwnK avatar Jul 30 '25 08:07 JinwnK

Hi, is there any update on this bug? I'm getting a similar error message with the example .fna and .gff files provided in this repo: "Not enough GFF files are provided. Some results might be omitted".

I am using the latest spacedust, foldseek, and mmseqs binaries, by the way.

E.g.:

spacedust createsetdb examples/uvig_120081.fna examples/uvig_255655.fna setDB/example_trial tmp --gff-dir examples/gff.txt --gff-type CDS
createsetdb examples/uvig_120081.fna examples/uvig_255655.fna setDB/example_trial tmp --gff-dir examples/gff.txt --gff-type CDS

MMseqs Version:          	5358214da8764737aa01af485b682729bb8d3ace
Database type            	0
Shuffle input database   	false
Createdb mode            	0
Write lookup file        	1
Offset of numeric ids    	0
Compressed               	0
Verbosity                	3
Min codons in orf        	30
Max codons in length     	32734
Max orf gaps             	2147483647
Contig start mode        	2
Contig end mode          	2
Orf start mode           	1
Forward frames           	1,2,3
Reverse frames           	1,2,3
Translation table        	1
Translate orf            	0
Use all table starts     	false
Create lookup            	0
Threads                  	12
Add orf stop             	false
GFF type                 	CDS
Statistics to be computed	
Tsv                      	false
File Inclusion Regex     	.*
File Exclusion Regex     	^$
gff dir file             	examples/gff.txt

createdb examples/uvig_120081.fna examples/uvig_255655.fna tmp/15431538959630537525/seqDB --dbtype 0 --shuffle 0 --createdb-mode 0 --write-lookup 1 --id-offset 0 --compressed 0 -v 3

Converting sequences

Time for merging to seqDB_h: 0h 0m 0s 0ms
Time for merging to seqDB: 0h 0m 0s 0ms
Database type: Nucleotide
Time for processing: 0h 0m 0s 3ms
Input DB type is Nucleotide.
gff2db examples/uvig_120081.gff tmp/15431538959630537525/seqDB /home/james/Documents/biotools/spacedust/setDB/example_trial_nucl --gff-type CDS --id-offset 0 --threads 12 -v 3

Not enough GFF files are provided. Some results might be omitted
[=================================================================] 100.00% 1 eta -
Time for merging to example_trial_nucl.lookup: 0h 0m 0s 1ms
Time for merging to example_trial_nucl_h: 0h 0m 0s 0ms
Time for merging to example_trial_nucl: 0h 0m 0s 1ms
Found these feature types and counts:
 - CDS: 53
Time for processing: 0h 0m 0s 27ms
translatenucs /home/james/Documents/biotools/spacedust/setDB/example_trial_nucl /home/james/Documents/biotools/spacedust/setDB/example_trial --translation-table 1 --add-orf-stop 0 -v 3 --compressed 0 --threads 12

[=================================================================] 100.00% 53 0s 1ms
Time for merging to example_trial: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 15ms
tsv2db /home/james/Documents/biotools/spacedust/setDB/example_trial_member_to_set.tsv /home/james/Documents/biotools/spacedust/setDB/example_trial_member_to_set --output-dbtype 5

Output database type: Alignment
Time for merging to example_trial_member_to_set: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 0ms
tsv2db /home/james/Documents/biotools/spacedust/setDB/example_trial_set_to_member.tsv /home/james/Documents/biotools/spacedust/setDB/example_trial_set_to_member --output-dbtype 5

Output database type: Alignment
Time for merging to example_trial_set_to_member: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 0ms
result2stats /home/james/Documents/biotools/spacedust/setDB/example_trial /home/james/Documents/biotools/spacedust/setDB/example_trial /home/james/Documents/biotools/spacedust/setDB/example_trial_set_to_member /home/james/Documents/biotools/spacedust/setDB/example_trial_set_size --stat linecount --tsv 0 --compressed 0 --threads 12 -v 3

[=================================================================] 100.00% 1 eta -
Time for merging to example_trial_set_size: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 2ms

Although, the run doesn't fail like in other users' examples.

jlingford avatar Dec 08 '25 04:12 jlingford