spacexr icon indicating copy to clipboard operation
spacexr copied to clipboard

Pre-Processing: Normalization / SpotClean / scIntegration?

Open Zockimonster opened this issue 7 months ago • 2 comments

Hi, thanks for providing a package allowing for cell type deconvolution!

I'm a bit confused which steps are required before running RCTD, since the tutorial recommends to run RCTD on untransformed counts. Even if I normalize the counts, they are already kind of transformed? Also I'm not sure whether using SpotClean before running RCTD is a good idea. I found issue 147 (https://github.com/dmcable/spacexr/issues/147), where I found the info, that it should be working, but: Does anyone have any experience using SpotClean before applying RCTD? Would you recommend doing so?

Also I want to build a Single Cell Reference based on distinct Single Cell Data Sets (including scRNA and snRNA data). How much batch correction should be applied before running RCTD, since the package also seems to correct some of the differences between technologies?

Maybe someone has any ideas. It really is kind of confusing, which steps are required at which point of an analysis...

Thanks & best regards, Eli

Zockimonster avatar Jul 02 '25 10:07 Zockimonster

Hi Eli,

Thank you for your question. RCTD works on raw counts. RCTD has it's own normalization method. You are allowed to filter cells or genes using your preferred approach. I don't have experience using SpotClean with RCTD, and that is not the recommended approach.

For the single cell reference, you can simply aggregate the cells together from multiple references. You may want to try to balance the number of cells across each reference. By default RCTD randomly subsamples cells from the reference if a cell type has too many cells over a fixed threshold.

Hope this helps!

Best, Dylan

dmcable avatar Jul 18 '25 23:07 dmcable

Hi Dylan, many thanks for your response! This helps a lot! I think, I'll then use the raw counts and simple merge the counts. So I'll only use the integration for checking resp. adjusting the cell type labels. Do you think balancing the number of cells across the references to include a defined proportion of cell types out of each reference (where it applies) afterwards, should be advisable even after adjusting cell type labels? Many thanks & best regards, Eli

Zockimonster avatar Jul 23 '25 08:07 Zockimonster