Cannot find gene ID in data for visualization
Hello!
Thanks a lot for developing IsoQuant! Has helped me a lot in my first steps in ONT data analysis!
Currently quantifying cDNA ONT data for 2 samples, and I have a small issue when trying to visualize some genes: The problem is that despite getting gene expression in TPM values (over >10 TPM) for 2 specific genes, I am getting this error:
Using transcript model quantification. Gene ENSG00000130477.16 not found in the data. Gene ENSG00000075043.20 not found in the data. self.reads_and_class structure: ({'*': 21514, 'antisense': 114485, 'full_splice_match': 20901411, 'genic': 122539, 'genic_intron': 238365, 'incomplete_splice_match': 13446974, 'intergenic': 786452, 'mono_exon_match': 2122285, 'novel_in_catalog': 1878340, 'novel_not_in_catalog': 11004136, 'undefined': 3}, {'ambiguous': 15339355, 'inconsistent': 2089504, 'inconsistent_ambiguous': 5030808, 'inconsistent_non_intronic': 5966495, 'intergenic': 760262, 'noninformative': 307550, 'unique': 17238431, 'unique_minor_difference': 3904099}) isoquant_26458614.out (END)
Code used for visualization:
outdir="/scratch/prj/bcn_pd_pesticides/Long-Reads-ALS/Alex_Nanopore_ALS/isoquant/"
genes="$outdir/pychopper/genes_isoquant.txt"
visualize.py` "$outdir" --counts --gene_list "$genes"
The gene_list looks like this:
ENSG00000104435.14 ENSG00000130477.16 ENSG00000145335.17 ENSG00000159556.10 ENSG00000016082.15 ENSG00000120948.19 ENSG00000075043.20
And here is the expression values (grouped, in TPM) for both genes missing in the data: ENSG00000130477.16 10.041741 23.522821 ENSG00000075043.20 10.328648 16.525185
Sorry for the long text! Thanks in advance for any help!
Best, Alex
EDIT: ENSG00000075043.20 gene plot was successful, was actually a typo (GTex Portal has the ID as .20, isoquant reports .21), sorry about that. Issue for the other gene still the same.
Dear @AlexFryd
Sorry for the delay! Also tagging the main visualizer developer @jackfreeman88
What if you try --ref_only option? I am not sure but it might use reference-based quantification (IsoQuant has 2 transcript tables - for reference features only and for discovered transcripts). I guess this option might switch to the first one.
Best Andrey
Hello!
I had a look on the .bam files on the ENSG00000130477 locus to see the mapped reads, and it appears that there are no reads spanning the entire major transcript(s), some canonical exons are not present in any of the reads (low number anyways), so that could be one reason. However, still cannot grasp the reason why I have counts/tpm values present in the respective files for both samples, yet they cannot be annotated in any transcript.
Best, Alex
@AlexFryd
Could you check .read_assignments.tsv and see if there are any reads assigned to the transcript from the gene of interest?
Best Andrey
Hello Andrey,
I am attaching the assigned reads for my gene of interest. It seems that all of them classify as ambiguous, inconsistent and in general, unreliable?
Best, Alex
@AlexFryd
I see quite a lot of uniquely assigned reads, although most of them are not spliced. It seems it was enough to assign them to a gene and transcript.
I think I'll add more quantification strategies in the future versions (e.g. FSM-only or spliced-only).
Best Andrey