celltypist multiple models

I would like to train cell typist on different data sets. Should I merge the 2 data sets and train the model once or train 2 models and do the annotation twice?

Mar 22 '24 15:03 anke-king

@anke-king, if you train them separately, you will get two independent models. If you want to combine them for training, you have to unify their annotations to make cell type names consistent. Both approaches are feasible (I personally prefer the former as it's quicker and it's intuitive to check the consistency of predictions from two datasets).

Mar 25 '24 22:03 ChuanXu1

Thank your for your reply! Just for clarification: I have one data set with cell types for training and a second data set with cell typest which are not in the first data set. In my target data set (which I want to annotate with my custom model) I expect to see cell typest from both data sets. So if I do the former, should I do the annotation twice and select the cell type based on the confidence score or how would I get the consensus annotation?

Thanks!!

Mar 26 '24 12:03 anke-king

@anke-king, if the cell types in the first and second training datasets are totally different, you can combine them and train a single model. For the confidence scores, they are not comparable across two different models; so if you use two models, you need to inspect separately (celltypist.dotplot will be useful most times), and judge by your knowledge.

Mar 30 '24 11:03 ChuanXu1

Hello! After doing the recommended suggestion, how do you recommend plotting the UMAP? In my particular case I have two datasets that should contain the same three cell types but for 24 hours and 72 hours. My current pipeline is: `

read 24dataset --> normalize --> classify with celltypist
read 72dataset --> normalize --> classify with celltypist
combine normalized 24h and 72h and apply sc.pp.combat using key 'dataset' (dataset variable is 24h or 72h)
See the combined umap `

Jul 26 '24 11:07 ManuelSokolov

@ManuelSokolov, you can try different integration methods for these two datasets and see how the celltypist predictions are overlaid on the umap.

Jul 28 '24 21:07 ChuanXu1

@ChuanXu1 if I understand correctly you mean:

Integrating 24h and 72h with the reference by dataset (using integration method) to object X
After integrating extract reference from X and use for training
Extract 24h from X --> clasify cell typist
Extract 72h from X --> classify cell typist

Jul 28 '24 21:07 ManuelSokolov

@ManuelSokolov, the first step is independent from the remaining three. You can annotate your data using CellTypist and add prediction columns in .obs. After that, you shall integrate your datasets by trying different methods (harmony, scVI, etc.)

Jul 28 '24 21:07 ChuanXu1