gffread icon indicating copy to clipboard operation
gffread copied to clipboard

Output issues for the -w, -x, and -y options

Open sjfleck opened this issue 4 years ago • 2 comments

My goal is to create a genome guided transcriptome assembly using Stringtie and use gffread to convert the output GFT into a GFF3. I seem to be able to create the .gff3 file without a problem, but I want to see how complete it is using BUSCO's transcriptome or protein option. It seems like -y might be the best option for that, but I'm having a difficult time getting it to work. I also tried to use the -w and -x options, but only -w worked. Here are my commands:

hisat2-build -p 16 $REF $SAMPLE hisat2 --max-intronlen 20000 -p 16 --dta -x $SAMPLE -1 $READS1 -2 $READS2 -S $SAMPLE.sam samtools sort -@ 16 -o $SAMPLE.bam $SAMPLE.sam stringtie $BAM -o $OUT -p 16 gffread $OUT > $SAMPLE.gff3

At this point, I have a .gff3 that seems to be fine, but when I run:

gffread $SAMPLE.gff3 -g $FASTA -w exons.fa -x cds.fa -y tr_cds.fa

I get a fasta file with spliced exons for each transcript, but cds.fa and tr_cds.fa are both empty. If you have any guidence for getting this to work. Thank you and thank you for creating all these tools.

sjfleck avatar Jun 08 '21 00:06 sjfleck

StringTie does not output any CDS features (only exon features), which are needed by -x -and -y options of gffread. You might want to run an ORF finder program (e.g. TransDecoder) in order to guess & assign likely CDS features to the StringTie output

gpertea avatar Jun 08 '21 01:06 gpertea

Thank you for your quick feedback!

sjfleck avatar Jun 08 '21 13:06 sjfleck