관리 메뉴

개발자비행일지

Clustering 이란 본문

▶Theory

Clustering 이란

Cyber0946 2023. 1. 10. 10:42

Clustering is the task of dividing the entire data into groups (also known as clusters) based on the patterns in the data. There are several clustering methods that can be used, including:

  1. K-Means Clustering: This is an iterative algorithm that divides a group of n data points into k subgroups based on the mean distance between data points and the centroid of the subgroup.
  2. Hierarchical Clustering: This method creates a hierarchy of clusters, where at each level, the clusters are joined based on their similarity. There are two types of hierarchical clustering: Agglomerative and Divisive.
  3. DBSCAN (Density-Based Spatial Clustering of Applications with Noise): This method clusters together points that are close to each other based on a distance measurement (Eps) and a minimum number of points (MinPts) within Eps distance.
  4. Gaussian Mixture Model: This is a probabilistic model that represents a composite distribution of multiple underlying normal distributions. It is used for clustering when the data is continuous and the clusters are assumed to be spherical.
  5. Spectral Clustering: This method uses the eigenvectors of a similarity matrix to reduce the dimensionality of the data and then applies K-Means Clustering on the reduced data.
  6. Affinity Propagation: This method involves each data point sending messages to all other points, indicating their suitability to be a representative, or exemplar, for that point. Points that receive the most similar messages are clustered together.
  7. Mean-Shift Clustering: This is a non-parametric method that finds the modes (density peaks) of the data distribution and assigns data points to the nearest mode.