BERTopic icon indicating copy to clipboard operation
BERTopic copied to clipboard

get broader topics - clusters with larger variance

Open marcel0307 opened this issue 3 years ago • 1 comments

My Bertopic result has too specific and too many clusters. Is there any way to increase the variance of the clusters? I understand that a high value for min_sample will lead to a small number of clusters being created, but the variance of each cluster remains low. I would like to get topics that are broader and less specific. Is there any way to achieve that?

marcel0307 avatar Aug 09 '22 14:08 marcel0307

You can increase the min_topic_size parameter to get topics that typically consist of more documents. It depends on the dataset but typically if you have more documents in a topic, then the resulting topic representation can get more broad. Similarly, the n_neighbors parameter in UMAP controls the extent to which the dimensionality reduction is focused on local vs. global structures. In your case, focusing a bit more on global structures might be the solution.

MaartenGr avatar Aug 10 '22 06:08 MaartenGr

Due to inactivity, I'll be closing this for now. Let me know if you have any other questions related to this and I'll make sure to re-open the issue!

MaartenGr avatar Sep 27 '22 08:09 MaartenGr