❮ Previous Next ❯

K-Nearest Neighbors (KNN)

K-Nearest Neighbors (KNN) is a simple machine learning algorithm that classifies data points based on the majority class of their nearest neighbors.

Distance Metrics
KNN works based on distance calculation between data points.
The most common distance metrics are:
1. Euclidean Distance (Most Common)
d=(x1−y1)2+(x2−y2)2d = \sqrt{(x_1 - y_1)^2 + (x_2 - y_2)^2}d=(x1−y1)2+(x2−y2)2
Used when data is continuous and properly scaled.
2. Manhattan Distance
d=∣x1−y1∣+∣x2−y2∣d = |x_1 - y_1| + |x_2 - y_2|d=∣x1−y1∣+∣x2−y2∣
Used when features represent grid-like distance.
3. Minkowski Distance
Generalized version:
d=(∑∣xi−yi∣p)1/pd = \left( \sum |x_i - y_i|^p \right)^{1/p}d=(∑∣xi−yi∣p)1/p
- p = 1 → Manhattan
- p = 2 → Euclidean
Important:
Always scale data before using KNN because distance is sensitive to feature magnitude.
Use:
- StandardScaler
- MinMaxScaler
Choosing K Value
K = Number of nearest neighbors.
Small K (e.g., K=1)
- Low bias
- High variance
- Risk of overfitting
Large K (e.g., K=20)
- High bias
- Low variance
- Risk of underfitting
Best Practice
- Choose odd K (avoid tie in binary classification)
- Use cross-validation to find best K
Example
If K = 3:
Among nearest 3 neighbors:
- 2 are Class A
- 1 is Class B
Final prediction → Class A

Lazy Learning
KNN is called a Lazy Learning Algorithm because:
- It does NOT build a model during training.
- It stores the entire dataset.
- Computation happens only at prediction time.
Why “Lazy”?
Training phase:
- Just store data.
Prediction phase:
- Calculate distance to all points.
- Sort them.
- Select K nearest.
So prediction is slower.

Advantages & Limitations
Advantages
1. Simple and easy to understand
2. No training phase required
3. Works well with small datasets
4. Can handle multi-class problems
Limitations
1. Slow prediction (computes distance to all points)
2. Sensitive to irrelevant features
3. Sensitive to feature scaling
4. Poor performance on large datasets
5. Curse of dimensionality (many features reduce accuracy)
Example: Pass/Fail Classification

KNN Classification Example in Python for Student Pass/Fail Prediction

This Python example demonstrates how to use the K-Nearest Neighbors (KNN) algorithm to predict whether a student will pass or fail based on study hours and attendance. The code creates a dataset, splits it into training and testing sets, applies feature scaling using StandardScaler, trains a KNN model with K=3, and evaluates the model by making predictions and calculating accuracy.

# Step 1: Import Libraries
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Step 2: Create Dataset
X = np.array([
    [2, 50],
    [3, 60],
    [5, 80],
    [6, 90],
    [1, 40]
])  # [Study Hours, Attendance]

y = np.array([0, 0, 1, 1, 0])  # 0 = Fail, 1 = Pass

# Step 3: Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Step 4: Feature Scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Step 5: Create Model (K=3)
model = KNeighborsClassifier(n_neighbors=3)

# Step 6: Train Model
model.fit(X_train, y_train)

# Step 7: Predict
prediction = model.predict(X_test)

print("Prediction:", prediction)
print("Accuracy:", model.score(X_test, y_test))

Output:
Probability of Passing: 0.47913110199975184
Predicted Class: 0
Bias-Variance in KNN
K Value
Bias
Variance
Small K
Low
High
Large K
High
Low

❮ Previous Next ❯

K Value	Bias	Variance
Small K	Low	High
Large K	High	Low

Distance Metrics

1. Euclidean Distance (Most Common)

2. Manhattan Distance

3. Minkowski Distance

Important:

Choosing K Value

Small K (e.g., K=1)

Large K (e.g., K=20)

Best Practice

Example

Lazy Learning

Why “Lazy”?

Advantages & Limitations

Advantages

Limitations

Example: Pass/Fail Classification

KNN Classification Example in Python for Student Pass/Fail Prediction

Bias-Variance in KNN

Login

Create Account