❮ Previous Next ❯

K-Means Clustering

K-Means is a popular clustering algorithm that divides data into K clusters based on similarity using centroid-based grouping.

K-Means Clustering
K-Means is an unsupervised learning algorithm used for clustering — grouping similar data points into clusters based on similarity.
Centroid Concept
- Each cluster has a centroid (mean position of points in that cluster)
- Algorithm assigns each data point to the nearest centroid
- After assignment, centroids are updated iteratively until they stabilize
Steps
1. Initialize K centroids randomly
2. Assign each data point to nearest centroid
3. Recalculate centroids as mean of points in cluster
4. Repeat steps 2-3 until centroids do not change
K Value Selection
- K = Number of clusters
- Choosing the right K is critical for meaningful clusters
Common Methods:
- Elbow Method
- Silhouette Score
Elbow Method
- Plot Inertia (within-cluster sum of squares) vs number of clusters (K)
- Inertia = Sum of squared distances of points from their cluster centroid
Inertia=∑i=1K∑x∈Ci∣∣x−μi∣∣2Inertia = \sum_{i=1}^{K} \sum_{x \in C_i} ||x - \mu_i||^2Inertia=i=1∑Kx∈Ci∑∣∣x−μi∣∣2
- Look for “elbow point” where inertia stops decreasing sharply → optimal K
Inertia
- Measures compactness of clusters
- Lower inertia → points closer to centroids → tighter clusters
- Too low inertia → might overfit (too many clusters)
Advantages & Limitations
Advantages
1. Simple and easy to implement
2. Fast and efficient on large datasets
3. Works well for spherical clusters
Limitations
1. Need to specify K beforehand
2. Sensitive to initial centroid placement
3. Poor performance for non-spherical clusters
4. Sensitive to outliers
Example: K-Means Clustering

K-Means Clustering Example in Python with Elbow Method

This Python example demonstrates how to use the K-Means clustering algorithm to group data points into clusters. The code first applies the Elbow Method to determine the optimal number of clusters (K) by analyzing inertia values. After selecting K=2, it trains the K-Means model, predicts cluster labels, and visualizes the clusters along with their centroids using Matplotlib.

# Step 1: Import Libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

# Step 2: Create Dataset
X = np.array([
    [1, 2],
    [1, 4],
    [1, 0],
    [10, 2],
    [10, 4],
    [10, 0]
])

# Step 3: Elbow Method to find optimal K
inertia = []
K_range = range(1, 6)
for k in K_range:
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(X)
    inertia.append(kmeans.inertia_)

# Plot Inertia vs K
plt.plot(K_range, inertia, 'bo-')
plt.xlabel("Number of Clusters (K)")
plt.ylabel("Inertia")
plt.title("Elbow Method")
plt.show()

# Step 4: Apply K-Means with K=2
kmeans = KMeans(n_clusters=2, random_state=42)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)

# Step 5: Plot Clusters
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=200, c='red', marker='X')
plt.title("K-Means Clustering")
plt.show()

Output:

❮ Previous Next ❯

K-Means Clustering

Centroid Concept

Steps

K Value Selection

Elbow Method

Inertia

Advantages & Limitations

Advantages

Limitations

Example: K-Means Clustering

K-Means Clustering Example in Python with Elbow Method

Login

Create Account