Difference between DBSCAN and [HDBSCAN

  • Simply put, HDBSCAN is a system that automatically adjusts the value of DBSCAN eps
  • Experiment with real data to observe behavior

image image https://pberba.github.io/stats/2020/01/17/hdbscan/ Illustration of eom(Excess of Mass), the default cluster selection criteria for HDBSCAN

2024-11-14 image

  • lower left
    • HDBSCAN recognizes the lower left “clearly separated cluster” as a whole cluster, regardless of the parameters.
    • DBSCAN ignores the end part as noise, gradually decreasing in size as the parameters change, and finally judging all of it as noise.
  • right
    • There’s not much difference in behavior, but HDBSCAN’s are more likely to judge the surrounding noise as part of the cluster and get involved.

DBSCAN — scikit-learn 1.5.2 documentation DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN

HDBSCAN — scikit-learn 1.5.2 documentation https://note.com/navy_azalea/n/na859d7ab6ab3

SpectralClustering — scikit-learn 1.5.2 documentation


This page is auto-translated from /nishio/DBSCANとHDBSCANの違い using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.