« Back to Glossary Index

K-means is a widely used unsupervised machine learning algorithm for clustering, which partitions a dataset into ‘k’ distinct, non-overlapping subsets (clusters) based on similarity. The objective is to group similar data points together, minimizing the variance within each cluster.

How K-means Works:

  1. Initialization: Select ‘k’ initial centroids randomly from the dataset.
  2. Assignment: Assign each data point to the nearest centroid, forming ‘k’ clusters.
  3. Update: Recalculate the centroids by computing the mean of all data points in each cluster.
  4. Repeat: Repeat the assignment and update steps until convergence, i.e., when the centroids no longer change significantly.

Considerations:

  • Choosing ‘k’: The number of clusters, ‘k’, must be specified beforehand. Techniques like the elbow method can help determine an appropriate ‘k’.
  • Initialization Sensitivity: The algorithm’s outcome can be sensitive to the initial placement of centroids. To mitigate this, it’s common to run the algorithm multiple times with different initializations and select the best result.
  • Cluster Shape Assumption: K-means assumes clusters are spherical and equally sized, which may not suit all datasets.
« Back to Glossary Index