Much faster versions of PCA + UMAP exist. Can we implement them?

Open Hellisotherpeople opened this issue 1 year ago • 3 comments

Intel-sklearn, CuML, and several other libraries should have optimized variants of PCA and likely a few other algorithms used. I can submit a PR implementing a few of these if you'd like. For certain types of vectors, this can cause a noticeable speedup.

GPU implementations might harm reproducibility. Might be other issues too that I haven't thought about. Thoughts?

Nov 24 '24 17:11 Hellisotherpeople

I'd be interested in a PR--please implement them as a new method like the existing umap for now if you do. I'd be especially interested in a speed comparison!

Dec 14 '24 06:12 vgel

GPU implementations might harm reproducibility

so does setting n_jobs>=1 so i'd say there's definitely a tradeoff for performance vs repro

Dec 14 '24 08:12 thiswillbeyourgithub

Btw this exists:

from sklearnex import patch_sklearn
patch_sklearn(global_patch=True)
import sklearn

Source: https://uxlfoundation.github.io/scikit-learn-intelex/latest/global-patching.html

Dec 14 '24 08:12 thiswillbeyourgithub