MAGIC icon indicating copy to clipboard operation
MAGIC copied to clipboard

Implementing Molecular Cross Validation

Open dburkhardt opened this issue 6 years ago • 2 comments

Is your feature request related to a problem? Please describe. Currently, MAGIC tends to oversmooth data when using automatic t selection and graph fitting parameters.

Describe the solution you'd like Implement Molecular Cross Validation (https://www.biorxiv.org/content/10.1101/786269v1)

Additional context Basic code flow: 0. Split the counts in each cell into a x1 and x2 (non-overlapping disjoint sets)

  1. Build the graph
  2. library size normalize x1
  3. PCA
  4. Build the graph with a given knn and t
  5. Create the diffusion operator, D
  6. Apply the diffusion operator to the library size normalized x1
  7. Multiply D(libnorm(x1)) by the library sizes of x2
  8. Calculate poisson loss
  • λ - kln(λ)
  1. Repeat for various k and t

dburkhardt avatar Oct 30 '19 02:10 dburkhardt

@dburkhardt I have some thoughts / materials on this courtesy of @batson and @jamestwebber. Happy for you to actually implement it of course :)

MAGIC Sweep: https://github.com/czbiohub/molecular-cross-validation/blob/master/src/molecular_cross_validation/scripts/magic_sweep.py

Similar, for a diffusion model: https://github.com/czbiohub/molecular-cross-validation/blob/master/src/molecular_cross_validation/scripts/diffusion_sweep.py

scottgigante avatar Oct 30 '19 13:10 scottgigante

Talked to @dburkhardt about this today while he was here. The magic_sweep is the most directly applicable script for this but I'll throw in the newly-added mcv_sweep module and Grid Search vignette notebook as additional resources.

The GridSearchMCV class should work with a little plumbing, but it can't do anything clever with caching and so it'll be a lot slower than a more carefully engineered solution.

jamestwebber avatar Oct 30 '19 23:10 jamestwebber