BERTopic icon indicating copy to clipboard operation
BERTopic copied to clipboard

Any tips how to use GPU to run Bertopic on GCP?

Open jpwavely opened this issue 3 years ago • 3 comments

@MaartenGr Is there any special tips in the Python code for me to run your Bertopic python code (model fit) on GCP Vertex AI Jupiter notebooks? I did run on GCP and observed the CPU usage/GPU usage, and did not see clues that GPU was running. Do I need to put extra codes to enable the usage of GPU?

-James P.

jpwavely avatar Sep 21 '22 00:09 jpwavely

There are different algorithms to BERTopic, some of which support GPU-acceleration and others do not. When embedding the documents, the GPU should automatically be used if you are using SentenceTransformers. As a default, UMAP and HDBSCAN do not support GPU-acceleration. However, you can use GPU-accelerated versions from cuml instead.

MaartenGr avatar Sep 21 '22 15:09 MaartenGr

You may or may not be aware but beckernick and cjnolet have done a lot of work on the cuml GPU version of HDBSCAN (All points membership vector for HDBSCAN and approximate_predict function for HDBSCAN). This should make using a GPU to run BERTopic and access more of its features in the future much easier. There are still some issues (New joblib breaks hdbscan in GPU jobs, Reduce memory pressure in HDBSCAN all_points_membership_vectors and API mismatch but some of these should be resolved by the time they release RAPIDS AI 22.10 next month.

I just used the nightly build yesterday, and it took some doing to get it to work on GPU but it isn't as much work as I was expecting. I haven't yet tried some features that I still plan on using/testing (e.g. hierarchical topics, visualizations) but so far it seems to work pretty well.

Side note: I put this here in part so that I have this all in one place and to make MaartenGr aware of it (though I wouldn't be surprised if he is already aware)

ldsands avatar Sep 21 '22 16:09 ldsands

Thank you @MaartenGr and @ldsands for your valuable feedback!

jpwavely avatar Sep 21 '22 16:09 jpwavely

@ldsands Thanks for the write-up and for providing several links to interesting features/fixes with respect to what is being done at RAPIDS AI 😄

MaartenGr avatar Sep 24 '22 08:09 MaartenGr