Problems converting the GTF
Hi,
FIrst of all, great tool!
I'm annotating a new genome for which I have Nonopore IsoSeq data. I initially predicted the protein coding genes with BRALER/AUGUSTUS. Now my idea would be to use that annotation as a reference to run IsoQuant with the aligned reads.
I have a couple of questions.
First, which of the two gtf should I take as a final annotation extended_annotation.gtf or transcript_models.gtf? For my understanding the first, extended_annotation.gtf, Because I want to keep the non expressed genes.
Second. I'm trying to convert the gtf into bed12/gff3 in order to extract the nt sequences and predict the CDS regions, but both files seem to be formatted in a way that Mikado can not covert. I also tried one of my scripts and I found the conversion is a bit weird.
How would you convert?
Thanks a lot F
Dear @francicco
Thanks for the feedback!
First, which of the two gtf should I take as a final annotation extended_annotation.gtf or transcript_models.gtf? For my understanding the first, extended_annotation.gtf, Because I want to keep the non expressed genes.
You are right, extended_annotation.gtf contains all (including non-expressed) reference genes/transcripts + novel genes/transcripts discovered by IsoQuant.
Second. I'm trying to convert the gtf into bed12/gff3 in order to extract the nt sequences and predict the CDS regions, but both files seem to be formatted in a way that Mikado can not covert. I also tried one of my scripts and I found the conversion is a bit weird. How would you convert?
I cannot recall converting IsoQuant output GTFs to any other format except gffutils SQL database, worked rather fine. In other cases I used gffread for conversion. Let me know if that helps, if not - I'll check Mikado for potential incompatibility.
Best Andrey