❮ Previous

Next ❯

DBSCAN Clustering

DBSCAN is a density-based clustering algorithm that groups closely packed data points and identifies noise or outliers in datasets.

DBSCAN Clustering
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is an unsupervised learning algorithm used for clustering based on density.
It is especially useful for arbitrary-shaped clusters and handling outliers.

Density-Based Clustering
- Clusters are formed where points are densely packed together
- Sparse regions are considered noise or outliers
- Unlike K-Means, no need to specify the number of clusters
Epsilon (ε)
- ε defines the radius of a neighborhood around a point
- Points within this distance are considered neighbors
Minimum Points (MinPts)
- MinPts = minimum number of points required to form a dense region (cluster)
- Helps distinguish core points from border points
Core, Border, and Noise Points
Point Type
Definition
Core Point
Has ≥ MinPts points in ε-neighborhood
Border Point
Has < MinPts neighbors but lies in ε-neighborhood of a core point
Noise/Outlier
Not a core or border point

How DBSCAN Works
1. Select a random point
2. Check ε-neighborhood
3. If neighbors ≥ MinPts → core point, form cluster
4. Expand cluster by recursively including neighbors
5. Repeat for all points
6. Points not in any cluster → noise/outliers
Example: DBSCAN

DBSCAN Clustering Example in Python for Detecting Clusters and Outliers

This Python example demonstrates how to use the DBSCAN clustering algorithm to group data points based on density. The code creates a dataset, applies DBSCAN with specified eps (neighborhood radius) and min_samples, and predicts cluster labels. It also identifies noise or outlier points (labeled as -1) and visualizes the clusters using Matplotlib.

# Step 1: Import Libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN

# Step 2: Create Dataset
X = np.array([
    [1, 2],
    [2, 2],
    [2, 3],
    [8, 7],
    [8, 8],
    [25, 80]
])

# Step 3: Create DBSCAN Model
# eps = neighborhood radius, min_samples = MinPts
dbscan = DBSCAN(eps=3, min_samples=2)
labels = dbscan.fit_predict(X)

# Step 4: Print Cluster Labels
print("Cluster Labels:", labels)  
# -1 means noise/outlier

# Step 5: Plot Clusters
plt.scatter(X[:,0], X[:,1], c=labels, cmap='plasma', s=100)
plt.title("DBSCAN Clustering")
plt.show()

Output:

Cluster Labels: [ 0 0 0 1 1 -1]
Noise & Outliers
- DBSCAN can detect outliers automatically
- Points labeled -1 → noise
- Advantage over K-Means which assigns every point to a cluster
Advantages & Limitations
Advantages
1. Detects clusters of arbitrary shape
2. Handles outliers/noise naturally
3. No need to specify number of clusters
Limitations
1. Sensitive to ε and MinPts parameters
2. Not effective for varying density clusters
Performance decreases on high-dimensional data

❮ Previous

Next ❯

Point Type	Definition
Core Point	Has ≥ MinPts points in ε-neighborhood
Border Point	Has < MinPts neighbors but lies in ε-neighborhood of a core point
Noise/Outlier	Not a core or border point

DBSCAN Clustering

Density-Based Clustering

Epsilon (ε)

Minimum Points (MinPts)

Core, Border, and Noise Points

How DBSCAN Works

Example: DBSCAN

DBSCAN Clustering Example in Python for Detecting Clusters and Outliers

Noise & Outliers

Advantages & Limitations

Advantages

Limitations

Login

Create Account