K-Means has a few problems however. The first is that it isnât a clustering algorithm, it is a partitioning algorithm. That is to say K-means doesnât âfind clustersâ it partitions your dataset into as many (assumed to be globular) chunks as you ask for by attempting to minimize intra-partition distances. --- Comparing Python Clustering Algorithms â hdbscan 0.8.1 documentation There are several problems with K-Means: first, it is a partitioning algorithm, not a clustering algorithm. That is, rather than âfinding clusters,â K-means attempts to minimize the distances within partitions, dividing the dataset into the required number of (assumed to be spherical) chunks.
Okay, this is not the most common wording, but itâs a nice distinction.
- K-Means (k-method) and Spectral Clustering are commonly called â[clustering
- However, the claim that these methods are not finding clusters, but are dividing the data set into chunks, so to speak, âpartitioningâ
- The DBSCAN paper also points out that âthe definition of the concept of a cluster is vague to begin with. - Cluster Definition in DBSCAN - > Cluster: the largest set for which all points are density-connected to each other and from which any point in the cluster can be reached from any other point.
This page is auto-translated from /nishio/ăŻă©ăčăżăȘăłă°ăšăăŒăăŁă·ă§ăăłă° using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. Iâm very happy to spread my thought to non-Japanese readers.