Confusion Matrix

  • A confusion matrix is a table used to evaluate classification models by comparing predicted and actual class labels.
  • Confusion Matrix

    A confusion matrix is a fundamental tool to evaluate classification models.
    It shows the counts of correct and incorrect predictions broken down by each class.


    Components of Confusion Matrix

    Term

    Definition

    True Positive (TP)

    Model predicts positive, and actual is positive

    True Negative (TN)

    Model predicts negative, and actual is negative

    False Positive (FP)

    Model predicts positive, but actual is negative (Type I Error)

    False Negative (FN)

    Model predicts negative, but actual is positive (Type II Error)


    Matrix Structure (2×2 Table)

    For binary classification (Positive = 1, Negative = 0):


    Predicted Positive

    Predicted Negative

    Actual Positive

    TP

    FN

    Actual Negative

    FP

    TN


    Interpretation

    • High TP & TN → Good model

    • High FP → Model predicts positives incorrectly → Risky in spam detection (false alarms)

    • High FN → Model misses actual positives → Risky in disease detection (missed cases)


    Example

    Suppose a spam classifier:

    • Actual Spam Emails = 5

    • Actual Not Spam Emails = 5

    • Model Predictions:

    Email

    Actual

    Predicted

    1

    Spam

    Spam

    2

    Spam

    Not Spam

    3

    Spam

    Spam

    4

    Not Spam

    Spam

    5

    Not Spam

    Not Spam

    Confusion Matrix:


    Predicted Spam

    Predicted Not Spam

    Actual Spam

    TP = 2

    FN = 1

    Actual Not Spam

    FP = 1

    TN = 1

    Python Example

Confusion Matrix Example in Python for Classification Evaluation

This Python example demonstrates how to evaluate a classification model using a Confusion Matrix. The code compares actual labels and predicted labels (Spam vs Not Spam), calculates the confusion matrix using scikit-learn, and displays the results. This helps understand how many predictions are correct and how many are false positives or false negatives.

# Step 1: Import Libraries
from sklearn.metrics import confusion_matrix
import numpy as np

# Step 2: Actual vs Predicted Labels
y_true = np.array([1, 1, 1, 0, 0])  # 1=Spam, 0=Not Spam
y_pred = np.array([1, 0, 1, 1, 0])

# Step 3: Compute Confusion Matrix
cm = confusion_matrix(y_true, y_pred)
print("Confusion Matrix:\n", cm)
  • Output:
    Confusion Matrix:
     [[1 1]
     [1 2]]