hdbscan-cpp icon indicating copy to clipboard operation
hdbscan-cpp copied to clipboard

Inquiry About Adding Compression Tree and EOM in C++ HDBSCAN

Open YUANMENG-1 opened this issue 11 months ago • 0 comments

  • I am currently working on a project where I need to perform HDBSCAN clustering on a large dataset (millions of iterations with 600-700 data points per run). Initially, I was using Python's HDBSCAN implementation, but due to performance issues, I tried GPU HDBSCAN and Fast HDBSCAN. However, neither provided satisfactory results in terms of speed and efficiency. As a result, I decided to switch to the C++ version of HDBSCAN.

  • While using the C++ version, I noticed that the number of noise points generated is significantly lower than with Python's HDBSCAN. This led me to wonder why the compression tree and EOM (Exponential of Minimum) methods, which are available in Python's HDBSCAN implementation, are not included in the C++ version.

  • I would like to ask whether it would be possible to incorporate a similar compression tree and EOM approach in the C++ implementation. My primary concern is whether adding these features, similar to Python's HDBSCAN, would significantly slow down the speed of the C++ implementation.

  • Your insights and advice on this matter would be greatly appreciated.

Thank you for your time and consideration.

Best regards,

YUANMENG-1 avatar Apr 06 '25 02:04 YUANMENG-1