Cell type annotation for more than 3 subtypes
Dear developers,
When making custom cell marker dataframe for cells with more than 3 subtypes (e.g. different subtypes of T cells), what is the best approach? Do you have plan to extend the software to support N subtypes; or hierachical cell types (e.g. immune cells)? Thank you in advance.
Hi @gkc251,
Thank you for your question! I'd like to introduce mLLMCelltype, a new tool that addresses exactly the challenges you mentioned with annotating multiple cell subtypes and hierarchical cell types.
mLLMCelltype Features Relevant to Your Needs:
1. Unlimited Subtype Support
mLLMCelltype can handle any number of cell subtypes without limitations. Whether you have 3, 10, or even 50+ T cell subtypes, our multi-model consensus framework can distinguish between them based on their marker gene profiles.
2. Hierarchical Annotation Support
The tool naturally handles hierarchical cell type relationships. For example:
- Level 1: Immune cells vs Non-immune cells
- Level 2: T cells, B cells, Myeloid cells, etc.
- Level 3: CD4+ T cells, CD8+ T cells, Tregs, etc.
- Level 4: Naive CD4+, Memory CD4+, Th1, Th2, Th17, etc.
3. How It Works
- Input: Simply provide marker genes for each cluster/subtype in CSV/TSV/Excel format
- Processing: Multiple state-of-the-art LLMs (GPT-4.1, Claude 4, Gemini 2.5, etc.) analyze your markers
- Output: Consensus cell type annotations with confidence scores
4. Key Advantages for Complex Cell Types
- Context-aware: You can specify tissue type to improve subtype resolution
- Multi-model consensus: Reduces errors common when distinguishing similar subtypes
- Confidence metrics: Provides Consensus Proportion and Shannon Entropy to flag uncertain annotations
- No predefined limits: Works with any number of clusters/subtypes
Try It Now
Web Interface (No Installation)
Visit https://www.mllmcelltype.com to try it immediately with your data.
Python Package
pip install mllmcelltype
R Package (Great for Seurat users)
devtools::install_github("cafferychen777/mLLMCelltype")
# CRAN submission in progress
Example Use Case for T Cell Subtypes
# Your marker genes dataframe might look like:
# Cluster_1: CD3D, CD4, IL7R, CCR7 # Naive CD4+
# Cluster_2: CD3D, CD4, IL7R, S100A4 # Memory CD4+
# Cluster_3: CD3D, CD8A, CCR7, LEF1 # Naive CD8+
# Cluster_4: CD3D, CD8A, GZMK, DUSP2 # Memory CD8+
# Cluster_5: CD3D, CD4, FOXP3, IL2RA # Tregs
# ... and many more subtypes
results = mllmcelltype.annotate(
marker_genes_df,
species="human",
tissue="PBMC", # Helps with subtype resolution
models=["gpt-4.1", "claude-4", "gemini-2.5"]
)
Why mLLMCelltype Excels at Complex Annotations
- LLMs understand biological context: They've been trained on vast amounts of biological literature and understand subtle differences between cell subtypes
- Multi-model validation: Different models cross-check each other, reducing misclassification of similar subtypes
- Continuous updates: As new cell types are discovered and published, the LLMs naturally incorporate this knowledge
Paper and Resources
- Paper: bioRxiv
- GitHub: https://github.com/cafferychen777/mLLMCelltype
- Documentation: Available on GitHub
Feel free to try it with your T cell data! The tool handles complex immune cell hierarchies particularly well since there's extensive literature on immune cell subtypes that the models can leverage.
Would love to hear about your experience if you give it a try!
Best regards, Chen