anndata Native support for cuPy/cuDF backed Anndata

This would be immensely useful for the GPU data science community as it would start to enable pipelines fully on GPU.

Apr 21 '20 21:04 cjnolet

This would be great. I'm not sure how we'd run in on CI though. AFAIK cupy and cudf don't have a "mock gpu" backend at the moment, right?

Apr 29 '20 10:04 ivirshup

Looks like the uarray project could be helpful in implementing the cupy side of this.

As for dataframes, this conversation on the ossdata discourse is probably worth following.

May 15 '20 04:05 ivirshup

Any update on this issue?

Nov 13 '21 03:11 daxiongshu

I've opened #1080 to track cupy support. @Intron7, what do you think about cuDF support? Is it a high priority?

I'm kinda eye-ing the dataframe-api, which maybe we could leverage for more dataframe types. If that pans out we could go for cuDF support via that.

Jul 27 '23 17:07 ivirshup

@cjnolet @ivirshup I think in general, it's a good idea to support cudf in the long term. cudf is insanely fast when it comes to correlations (significantly faster than cupy) and other math-related tasks. However, as far as I know, it still has some issues with apply and categorical data. I attempted to use cudf for the GPU port of squidpy's ligrec, but it lacked some key features. What would be advantageous in the future is a fully dynamic anndata where you can seamlessly switch everything in and out of VRAM, and cudf has the potential to assist with that.

It would be beneficial to load data directly to the GPU from text files. Unfortunately, h5 files are not yet accelerated. Currently, I'm also concerned about VRAM in general. There isn't much heavy math computation going on in dataframes for anndata.

The areas where I foresee the most immediate benefits are likely the creation of .X .Layers etc. from cudf.Dataframes.

Jul 27 '23 17:07 Intron7