BERTopic icon indicating copy to clipboard operation
BERTopic copied to clipboard

Hierarchical Visualization of the topics using HDBSCAN

Open e-barrere opened this issue 3 years ago • 2 comments

Hello,

Thank you for this fantastic work, Bertopic is really useful. I was wondering why is the visualization of the hierarchy based off the results of the c_tf_idf ? Since the HDBSCAN results is already a hierarchical result, why recalculate a distance representation from the c_tf_idf rather than using the hdbscan result?

Thank you

e-barrere avatar Aug 04 '22 15:08 e-barrere

The main reason for this is modularity. Although HDBSCAN is the default model, other clustering algorithms can be used instead, such as k-Means. In order to support any clustering technique, it is necessary to make this step, somewhat, independent. There is also something to say for comparing the end-result, the topic representations and too a lesser extent the clusters. That, however, might just be semantics although it does follow the philosophy of modularity as presented in the package.

MaartenGr avatar Aug 04 '22 19:08 MaartenGr

I get it now thank you for your answer !

e-barrere avatar Aug 08 '22 08:08 e-barrere