get broader topics - clusters with larger variance
My Bertopic result has too specific and too many clusters. Is there any way to increase the variance of the clusters? I understand that a high value for min_sample will lead to a small number of clusters being created, but the variance of each cluster remains low. I would like to get topics that are broader and less specific. Is there any way to achieve that?
You can increase the min_topic_size parameter to get topics that typically consist of more documents. It depends on the dataset but typically if you have more documents in a topic, then the resulting topic representation can get more broad. Similarly, the n_neighbors parameter in UMAP controls the extent to which the dimensionality reduction is focused on local vs. global structures. In your case, focusing a bit more on global structures might be the solution.
Due to inactivity, I'll be closing this for now. Let me know if you have any other questions related to this and I'll make sure to re-open the issue!