Output issues for the -w, -x, and -y options
My goal is to create a genome guided transcriptome assembly using Stringtie and use gffread to convert the output GFT into a GFF3. I seem to be able to create the .gff3 file without a problem, but I want to see how complete it is using BUSCO's transcriptome or protein option. It seems like -y might be the best option for that, but I'm having a difficult time getting it to work. I also tried to use the -w and -x options, but only -w worked. Here are my commands:
hisat2-build -p 16 $REF $SAMPLE hisat2 --max-intronlen 20000 -p 16 --dta -x $SAMPLE -1 $READS1 -2 $READS2 -S $SAMPLE.sam samtools sort -@ 16 -o $SAMPLE.bam $SAMPLE.sam stringtie $BAM -o $OUT -p 16 gffread $OUT > $SAMPLE.gff3
At this point, I have a .gff3 that seems to be fine, but when I run:
gffread $SAMPLE.gff3 -g $FASTA -w exons.fa -x cds.fa -y tr_cds.fa
I get a fasta file with spliced exons for each transcript, but cds.fa and tr_cds.fa are both empty. If you have any guidence for getting this to work. Thank you and thank you for creating all these tools.
StringTie does not output any CDS features (only exon features), which are needed by -x -and -y options of gffread.
You might want to run an ORF finder program (e.g. TransDecoder) in order to guess & assign likely CDS features to the StringTie output
Thank you for your quick feedback!