TAPE icon indicating copy to clipboard operation
TAPE copied to clipboard

Data format for bulk RNA-seq with 10X UMI data as reference: counts, TPM or PFKM?

Open Junjie-Hu opened this issue 2 years ago • 3 comments

Hi, After reading the tutorials carefully, I still feel confused how to prepare the input data. In most cases, users want to get cell-type fractions from tumor bulk RNA-seq data using the 10X data as reference. On the website, the author declared seting datatype='counts', so is sc_ref the UMI matrix of 10X data? For bulkdata, should we use counts, TPM or FPKM data? Could you please give an example on the usage website? For instance, deconvolution of bulk PBMC dataset with 10X single-cell PMBC data.

Junjie-Hu avatar Sep 23 '23 07:09 Junjie-Hu

should bulk TPM or FPKM data be log2 transformed?

Junjie-Hu avatar Sep 23 '23 07:09 Junjie-Hu

Hi Junjie,

Sorry for the late reply. For sc_ref argument, it is the single cell data from whatever 10X or other sequencing technology. For the datatype argument, please note no matter what bulk data type is I suggest you use the default “count” argument. Any further questions are welcome!

Regards, Yanshuo

poseidonchan avatar Oct 01 '23 18:10 poseidonchan

should bulk TPM or FPKM data be log2 transformed?

The raw TPM or FPKM data is better than log-transformed data

poseidonchan avatar Oct 02 '23 18:10 poseidonchan