[BUG] GFF3 output file does not contain sequence-region name
Describe the bug The .gff (GFF3) output of the PGAP pipeline produces GFF3 files that do not contain the name of the sequence-region on the ##sequence-region line. This hurts downstream compatibility with programs that expect region names, and is a standard feature of GFF3 files according to this documentation: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md
To Reproduce I ran PGAP using the test files provided on installation, using the following command:
sudo ./pgap.py -r -o mg37_results -g /root/.pgap/test_genomes/MG37/ASM2732v1.annotation.nucleotide.1.fasta -s 'Mycoplasmoides genitalium'
Expected behavior The first 4 lines of the annot.gff file produced look like this:
##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
##sequence-region 1 580076
But I would have expected them to look something like this:
##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
##sequence-region L43967.2 1 580076
Software versions (please complete the following information):
- Ubuntu
- PGAP version 2024-07-18.build7555 is up to date.
- Docker version 26.1.4, build 5650f9b
Thank you for your report, Andrea!
We will investigate this shortly.
We identified the application that caused this problem and opened another internal investigation (RW-2312).
The problem was fixed and the fix will appear in the next release