SUPPA icon indicating copy to clipboard operation
SUPPA copied to clipboard

How to deal with NAs in PSI MAtrix

Open antgomo opened this issue 4 years ago • 2 comments

Hi Eduardo,

I have a great number of samples and i used SUPPA2 to get PSI per isoform (psiPerIsoform).

My idea , a part to run a differential splicing analysis using SUPPA2, is to keep the matrix with all my samples and PSI index for all the isoforms in order to use this particular software

https://advances.sciencemag.org/content/7/14/eabd6991

That uses SUPPA to generate the splicing profile to use in the main algorithm

My question is that there are a lot of NAs in my matrix, not so much but let's say a 10-20% in different. I know that SUPPA puts NA if the TPM coming from salmon is 0. My question then is how can i deal with NAs to use this matrix as an object to deal with downstream analysis, i.e MultiOMICS as the paper that i am refering: remove the transcripts with majority of NAs in all samples? Replace with 0?

MAny thanks in advance

Toni

antgomo avatar Jun 11 '21 20:06 antgomo

Hi,

NAs are not just when TPM = 0. If the numerator TPM is zero, but the gene is expressed (other transcripts have TPM >0), PSI = 0 is a meaningful value.

NAs mean that there is not sufficient expression in any of the transcripts in that gene to estimate the relative inclusion of the events. It could happen because of the low sequencing depth or because the genes are not expressed.

The question is whether downstream methods can handle NAs or not. The methods used in SUPPA can handle some NAs and you can control what proportion you accept. If the method you want to use cannot handle NAs, you can either eliminate those rows or use imputation to give an estimated value. Imputation could be done in rows with a limited number of NAs, and following the distribution of the existing values across rows and columns.

I hope this helps

Eduardo

On Sat, 12 Jun 2021 at 06:07, antgomo @.***> wrote:

Hi Eduardo,

I have a great number of samples and i used SUPPA2 to get PSI per isoform (psiPerIsoform).

My idea , a part to run a differential splicing analysis using SUPPA2, is to keep the matrix with all my samples and PSI index for all the isoforms in order to use this particular software

https://advances.sciencemag.org/content/7/14/eabd6991

That uses SUPPA to generate the splicing profile to use in the main algorithm

My question is that there are a lot of NAs in my matrix, not so much but let's say a 10-20% in different. I know that SUPPA puts NA if the TPM coming from salmon is 0. My question then is how can i deal with NAs to use this matrix as an object to deal with downstream analysis, i.e MultiOMICS as the paper that i am refering: remove the transcripts with majority of NAs in all samples? Replace with 0?

MAny thanks in advance

Toni

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/125, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKBYYXTH7BVXXK3C2CBLTSJUH3ANCNFSM46RMXVTQ .

-- Prof. E Eyras EMBL Australia Group Leader The John Curtin School of Medical Research - Australian National University https://github.com/comprna http://scholar.google.com/citations?user=LiojlGoAAAAJ

EduEyras avatar Jun 13 '21 03:06 EduEyras

Ok, thanks Eduardo

Yes I will try imputation.

Many Thanks!

Toni

antgomo avatar Jun 14 '21 14:06 antgomo