TAPE Data format for bulk RNA-seq with 10X UMI data as reference: counts, TPM or PFKM?

Hi, After reading the tutorials carefully, I still feel confused how to prepare the input data. In most cases, users want to get cell-type fractions from tumor bulk RNA-seq data using the 10X data as reference. On the website, the author declared seting datatype='counts', so is sc_ref the UMI matrix of 10X data? For bulkdata, should we use counts, TPM or FPKM data? Could you please give an example on the usage website? For instance, deconvolution of bulk PBMC dataset with 10X single-cell PMBC data.

Sep 23 '23 07:09 Junjie-Hu

should bulk TPM or FPKM data be log2 transformed?

Sep 23 '23 07:09 Junjie-Hu

Hi Junjie,

Sorry for the late reply. For sc_ref argument, it is the single cell data from whatever 10X or other sequencing technology. For the datatype argument, please note no matter what bulk data type is I suggest you use the default “count” argument. Any further questions are welcome!

Regards, Yanshuo

Oct 01 '23 18:10 poseidonchan

should bulk TPM or FPKM data be log2 transformed?

The raw TPM or FPKM data is better than log-transformed data

Oct 02 '23 18:10 poseidonchan