Implementing Molecular Cross Validation
Is your feature request related to a problem? Please describe.
Currently, MAGIC tends to oversmooth data when using automatic t selection and graph fitting parameters.
Describe the solution you'd like Implement Molecular Cross Validation (https://www.biorxiv.org/content/10.1101/786269v1)
Additional context Basic code flow: 0. Split the counts in each cell into a x1 and x2 (non-overlapping disjoint sets)
- Build the graph
- library size normalize x1
- PCA
- Build the graph with a given
knnandt - Create the diffusion operator,
D - Apply the diffusion operator to the library size normalized x1
- Multiply D(libnorm(x1)) by the library sizes of x2
- Calculate poisson loss
- λ - kln(λ)
- Repeat for various
kandt
@dburkhardt I have some thoughts / materials on this courtesy of @batson and @jamestwebber. Happy for you to actually implement it of course :)
MAGIC Sweep: https://github.com/czbiohub/molecular-cross-validation/blob/master/src/molecular_cross_validation/scripts/magic_sweep.py
Similar, for a diffusion model: https://github.com/czbiohub/molecular-cross-validation/blob/master/src/molecular_cross_validation/scripts/diffusion_sweep.py
Talked to @dburkhardt about this today while he was here. The magic_sweep is the most directly applicable script for this but I'll throw in the newly-added mcv_sweep module and Grid Search vignette notebook as additional resources.
The GridSearchMCV class should work with a little plumbing, but it can't do anything clever with caching and so it'll be a lot slower than a more carefully engineered solution.