Hyperparameter Tuning (Basic Introduction)

  • This lesson introduces hyperparameter tuning and explains how adjusting parameters can improve machine learning model accuracy.
  • What are Hyperparameters?

    • Hyperparameters are parameters set before training the model.

    • They control model behavior and learning process, unlike model parameters (like weights) which are learned during training.

    Examples:

    Model

    Hyperparameter

    Description

    Linear Regression

    None (simple)

    Decision Tree

    max_depth

    Maximum depth of tree

    Random Forest

    n_estimators

    Number of trees in the forest

    KNN

    n_neighbors

    Number of nearest neighbors to consider

    SVM

    C, kernel

    Regularization and kernel type


    Grid Search Concept

    • Grid Search is a method to try all possible combinations of hyperparameter values.

    • Helps to find the best hyperparameters that maximize model performance.

    • Often combined with cross-validation to avoid overfitting.

    Example: For a Decision Tree:

    • max_depth: [3, 5, 7]

    • min_samples_split: [2, 5, 10]

    • Grid Search tries all 3×3 = 9 combinations


    Model Selection

    • After hyperparameter tuning, select the model with:

      • Best cross-validated performance

      • Balanced bias-variance tradeoff

    • Helps prevent overfitting/underfitting


    Cross-Validated Grid Search

    • Combines Grid Search + Cross-Validation

    • Steps:

      1. Define parameter grid

      2. For each combination, perform K-Fold cross-validation

      3. Compute average metric (e.g., accuracy, R²)

      4. Select combination with best average score

    Python Example: Grid Search with Cross-Validation

Grid Search Hyperparameter Tuning Example in Python using Random Forest

This Python example demonstrates how to perform hyperparameter tuning using GridSearchCV in machine learning. The code loads the Iris dataset, creates a Random Forest classifier, and defines a grid of hyperparameters such as n_estimators, max_depth, and min_samples_split. It then uses 5-fold cross validation to test different parameter combinations and prints the best hyperparameters along with the highest cross-validated accuracy.

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

# Step 1: Load Dataset
iris = load_iris()
X, y = iris.data, iris.target

# Step 2: Define Model
rf = RandomForestClassifier(random_state=42)

# Step 3: Define Hyperparameter Grid
param_grid = {
    'n_estimators': [50, 100, 150],
    'max_depth': [3, 5, None],
    'min_samples_split': [2, 5]
}

# Step 4: Grid Search with 5-Fold Cross Validation
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=5, scoring='accuracy')

# Step 5: Fit Grid Search
grid_search.fit(X, y)

# Step 6: Best Parameters & Score
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Cross-Validated Accuracy:", grid_search.best_score_)
  • Output:

    Accuracy for each fold: [1.         0.96666667 0.93333333 1.         0.93333333]

    Average Accuracy: 0.9666666666666668