Hierarchical Clustering

Hierarchical Clustering is an unsupervised learning algorithm used to group similar data points into a hierarchy of clusters. Unlike K-Means, it does not require specifying the number of clusters upfront.

Agglomerative Clustering (Bottom-Up)

Starts with each data point as its own cluster
Iteratively merges the closest clusters
Stops when all points belong to one cluster or a distance threshold is reached

Steps:

Assign each point to its own cluster
Compute distances between all clusters
Merge the closest clusters
Repeat until all points merge into a single cluster

Divisive Clustering (Top-Down)

Starts with all points in one cluster
Iteratively splits clusters into smaller clusters
Continues until each point is its own cluster

Dendrogram

A dendrogram is a tree-like diagram showing hierarchical relationships between clusters.

Y-axis → Distance at which clusters are merged
X-axis → Data points
Can cut the dendrogram at a certain height to select the number of clusters

Example Dendrogram Interpretation

Short distance → Similar points
Tall branches → Less similar points
Cut the tree → Desired number of clusters

Linkage Methods

Linkage defines distance between clusters when merging.

Linkage Method	Description
Single Linkage	Distance between closest points in clusters
Complete Linkage	Distance between farthest points in clusters
Average Linkage	Average distance between all points in clusters
Ward Linkage	Minimizes variance within clusters (most common)

Example: Agglomerative Clustering

Hierarchical Clustering Example in Python with Dendrogram

This Python example demonstrates how to perform Hierarchical Clustering using Agglomerative Clustering. The code first creates a dataset and generates a dendrogram using the Ward linkage method to visualize how data points are merged step by step. After analyzing the dendrogram, the model groups the data into two clusters and prints the cluster labels.

# Step 1: Import Libraries
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
from sklearn.cluster import AgglomerativeClustering

# Step 2: Create Dataset
X = np.array([
    [1, 2],
    [1, 4],
    [1, 0],
    [10, 2],
    [10, 4],
    [10, 0]
])

# Step 3: Create Linkage Matrix for Dendrogram
linked = linkage(X, method='ward')  # 'ward', 'single', 'complete', 'average'

# Step 4: Plot Dendrogram
plt.figure(figsize=(8,5))
dendrogram(linked,
           orientation='top',
           distance_sort='descending',
           show_leaf_counts=True)
plt.title("Hierarchical Clustering Dendrogram")
plt.xlabel("Data Points")
plt.ylabel("Distance")
plt.show()

# Step 5: Agglomerative Clustering (choose 2 clusters)
model = AgglomerativeClustering(n_clusters=2, metric='euclidean', linkage='ward')
labels = model.fit_predict(X)
print("Cluster Labels:", labels)

Output:

❮ Previous Next ❯