Hierarchical Visualization of the topics using HDBSCAN
Hello,
Thank you for this fantastic work, Bertopic is really useful. I was wondering why is the visualization of the hierarchy based off the results of the c_tf_idf ? Since the HDBSCAN results is already a hierarchical result, why recalculate a distance representation from the c_tf_idf rather than using the hdbscan result?
Thank you
The main reason for this is modularity. Although HDBSCAN is the default model, other clustering algorithms can be used instead, such as k-Means. In order to support any clustering technique, it is necessary to make this step, somewhat, independent. There is also something to say for comparing the end-result, the topic representations and too a lesser extent the clusters. That, however, might just be semantics although it does follow the philosophy of modularity as presented in the package.
I get it now thank you for your answer !