Advanced Performance Metrics

  • Advanced performance metrics evaluate classification models using probability-based and threshold-based evaluation techniques.
  • Precision

    Precision measures how many of the predicted positives are actually positive.

    Precision=TPTP+FP\text{Precision} = \frac{TP}{TP + FP}Precision=TP+FPTP​

    • Focuses on correctness of positive predictions

    • High Precision → Few false positives

    Example: Spam detection

    • Of 100 predicted spam emails, only 80 are truly spam → Precision = 80%


    Recall (Sensitivity / True Positive Rate)

    Recall measures how many of the actual positives are correctly predicted.

    Recall=TPTP+FN\text{Recall} = \frac{TP}{TP + FN}Recall=TP+FNTP​

    • Focuses on capturing all actual positives

    • High Recall → Few false negatives

    Example: Spam detection

    • Out of 100 actual spam emails, model detects 90 → Recall = 90%


    F1 Score

    F1 Score is the harmonic mean of Precision and Recall, balancing both.

    F1=2×Precision×RecallPrecision+RecallF1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}F1=2×Precision+RecallPrecision×Recall​

    • High F1 → Good balance between Precision and Recall

    • Useful when both false positives and false negatives matter


    Precision-Recall Tradeoff

    • Increasing Precision → Often reduces Recall

    • Increasing Recall → Often reduces Precision

    • Threshold tuning allows finding the right balance based on problem requirement

    Example:

    • Spam email classifier:

      • High Precision → Fewer non-spam marked as spam (reduce false alarms)

      • High Recall → Catch most spam emails (even if some non-spam flagged)

    Python Example

Precision, Recall, and F1 Score Calculation in Python for Model Evaluation

This Python example demonstrates how to evaluate a classification model using important metrics such as Precision, Recall, and F1 Score. The code compares actual labels and predicted labels, calculates each metric using scikit-learn, and prints the results. These metrics help measure the quality of predictions, especially in classification problems where class imbalance may occur.

from sklearn.metrics import precision_score, recall_score, f1_score
import numpy as np

# Step 1: Actual vs Predicted
y_true = np.array([1, 1, 0, 0, 1, 0, 0, 1, 0, 0])
y_pred = np.array([1, 0, 0, 0, 1, 0, 0, 1, 0, 0])

# Step 2: Calculate Precision
precision = precision_score(y_true, y_pred)
print("Precision:", precision)

# Step 3: Calculate Recall
recall = recall_score(y_true, y_pred)
print("Recall:", recall)

# Step 4: Calculate F1 Score
f1 = f1_score(y_true, y_pred)
print("F1 Score:", f1)
  • Output:

    Precision: 1.0

    Recall: 0.75

    F1 Score: 0.8571428571428571