Using the new --mix option for short and long-read data

Open niradsp opened this issue 4 years ago • 0 comments

I have previously been using FLAIR for the so-called "hybrid" method. That program allows short-read data to correct long-read data. I wanted to test stringtie's new --mix method. Here is how I am running it: stringtie -G --mix -o <output.gtf> -p 10 -e

This will calculate the abundance.
Also, I am wondering how the TPM value is calculated. Is it based more on the long-read data or the short-read data? FLAIR I think is more "long read centric". In other words, the data is based on Nanopore's output. What about Stringtie? Is the TPM values more short-read base or long-read based?

Another question. How do I extract a tab-delimited TPM data? I am currently just parsing each GTF file and extracting the ENST ID along with the TPM value. Is this fine, or is there any other method? I notice that the -A option gives me a file at the gene level, not the isoform level. I am mainly interested in isoform usage.

With just the command above, I am getting far more isoforms than FLAIR, and the ones that I found significant in FLAIR I am finding them in stringtie's data as well. Please let me know if the command above looks good.

Thanks in advance, Nirad

Aug 09 '21 18:08 niradsp