How to deal with NAs in PSI MAtrix
Hi Eduardo,
I have a great number of samples and i used SUPPA2 to get PSI per isoform (psiPerIsoform).
My idea , a part to run a differential splicing analysis using SUPPA2, is to keep the matrix with all my samples and PSI index for all the isoforms in order to use this particular software
https://advances.sciencemag.org/content/7/14/eabd6991
That uses SUPPA to generate the splicing profile to use in the main algorithm
My question is that there are a lot of NAs in my matrix, not so much but let's say a 10-20% in different. I know that SUPPA puts NA if the TPM coming from salmon is 0. My question then is how can i deal with NAs to use this matrix as an object to deal with downstream analysis, i.e MultiOMICS as the paper that i am refering: remove the transcripts with majority of NAs in all samples? Replace with 0?
MAny thanks in advance
Toni
Hi,
NAs are not just when TPM = 0. If the numerator TPM is zero, but the gene is expressed (other transcripts have TPM >0), PSI = 0 is a meaningful value.
NAs mean that there is not sufficient expression in any of the transcripts in that gene to estimate the relative inclusion of the events. It could happen because of the low sequencing depth or because the genes are not expressed.
The question is whether downstream methods can handle NAs or not. The methods used in SUPPA can handle some NAs and you can control what proportion you accept. If the method you want to use cannot handle NAs, you can either eliminate those rows or use imputation to give an estimated value. Imputation could be done in rows with a limited number of NAs, and following the distribution of the existing values across rows and columns.
I hope this helps
Eduardo
On Sat, 12 Jun 2021 at 06:07, antgomo @.***> wrote:
Hi Eduardo,
I have a great number of samples and i used SUPPA2 to get PSI per isoform (psiPerIsoform).
My idea , a part to run a differential splicing analysis using SUPPA2, is to keep the matrix with all my samples and PSI index for all the isoforms in order to use this particular software
https://advances.sciencemag.org/content/7/14/eabd6991
That uses SUPPA to generate the splicing profile to use in the main algorithm
My question is that there are a lot of NAs in my matrix, not so much but let's say a 10-20% in different. I know that SUPPA puts NA if the TPM coming from salmon is 0. My question then is how can i deal with NAs to use this matrix as an object to deal with downstream analysis, i.e MultiOMICS as the paper that i am refering: remove the transcripts with majority of NAs in all samples? Replace with 0?
MAny thanks in advance
Toni
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/125, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKBYYXTH7BVXXK3C2CBLTSJUH3ANCNFSM46RMXVTQ .
-- Prof. E Eyras EMBL Australia Group Leader The John Curtin School of Medical Research - Australian National University https://github.com/comprna http://scholar.google.com/citations?user=LiojlGoAAAAJ
Ok, thanks Eduardo
Yes I will try imputation.
Many Thanks!
Toni