pgap icon indicating copy to clipboard operation
pgap copied to clipboard

[BUG] GFF3 output file does not contain sequence-region name

Open watsonar opened this issue 1 year ago • 2 comments

Describe the bug The .gff (GFF3) output of the PGAP pipeline produces GFF3 files that do not contain the name of the sequence-region on the ##sequence-region line. This hurts downstream compatibility with programs that expect region names, and is a standard feature of GFF3 files according to this documentation: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md

To Reproduce I ran PGAP using the test files provided on installation, using the following command:

sudo ./pgap.py -r -o mg37_results -g /root/.pgap/test_genomes/MG37/ASM2732v1.annotation.nucleotide.1.fasta -s 'Mycoplasmoides genitalium'

Expected behavior The first 4 lines of the annot.gff file produced look like this:

##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
##sequence-region  1 580076

But I would have expected them to look something like this:

##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
##sequence-region L43967.2 1 580076

Software versions (please complete the following information):

  • Ubuntu
  • PGAP version 2024-07-18.build7555 is up to date.
  • Docker version 26.1.4, build 5650f9b

watsonar avatar Aug 02 '24 23:08 watsonar

Thank you for your report, Andrea!

We will investigate this shortly.

azat-badretdin avatar Aug 02 '24 23:08 azat-badretdin

We identified the application that caused this problem and opened another internal investigation (RW-2312).

azat-badretdin avatar Aug 05 '24 19:08 azat-badretdin

The problem was fixed and the fix will appear in the next release

azat-badretdin avatar Aug 08 '24 18:08 azat-badretdin