cedivad
cedivad
HDBSCAN is important – but don't forget UMAP! I'm still trying to optimise my parameters, tried about 15 runs with different parameters yesterday and none of it was particularly successful,...
I'm happy to share anything you need but I don't think it might be of much use. I'm working with an heterogeneous collection of ~300M threads from narkive.com. My latest...
A small update, as requested :) After training I ended up with something in the region of 12k topics using 12-dimensional embeddings following UMAP reduction (on a 4M sample of...
I think BERTopic is surprisingly fast out of the box (at least when you only care about the two models and discard all of the tf-idf data etc). Speed depends...
Here is what I worked on, it should help others getting started on the way to a billion BERTopic inferences :) https://github.com/cedivad/BERTopic-deploy
I was also looking for this feature. I assume the models aren't binary-compatible and we can't use a model created by cuml for say scikit-learn's approximate_predict?
I've looked at SKLearn's implementation and it seems they are using a brute force approach, calculating distances to each centroid one by one. On a GPU, I'm thinking yes, you...