Hyperparameter Tuning (Basic Introduction)
- This lesson introduces hyperparameter tuning and explains how adjusting parameters can improve machine learning model accuracy.
What are Hyperparameters?
Hyperparameters are parameters set before training the model.
They control model behavior and learning process, unlike model parameters (like weights) which are learned during training.
Examples:
Grid Search Concept
Grid Search is a method to try all possible combinations of hyperparameter values.
Helps to find the best hyperparameters that maximize model performance.
Often combined with cross-validation to avoid overfitting.
Example: For a Decision Tree:
max_depth: [3, 5, 7]
min_samples_split: [2, 5, 10]
Grid Search tries all 3×3 = 9 combinations
Model Selection
After hyperparameter tuning, select the model with:
Best cross-validated performance
Balanced bias-variance tradeoff
Helps prevent overfitting/underfitting
Cross-Validated Grid Search
Combines Grid Search + Cross-Validation
Steps:
Define parameter grid
For each combination, perform K-Fold cross-validation
Compute average metric (e.g., accuracy, R²)
Select combination with best average score
Python Example: Grid Search with Cross-Validation
Grid Search Hyperparameter Tuning Example in Python using Random Forest
This Python example demonstrates how to perform hyperparameter tuning using GridSearchCV in machine learning. The code loads the Iris dataset, creates a Random Forest classifier, and defines a grid of hyperparameters such as n_estimators, max_depth, and min_samples_split. It then uses 5-fold cross validation to test different parameter combinations and prints the best hyperparameters along with the highest cross-validated accuracy.
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
# Step 1: Load Dataset
iris = load_iris()
X, y = iris.data, iris.target
# Step 2: Define Model
rf = RandomForestClassifier(random_state=42)
# Step 3: Define Hyperparameter Grid
param_grid = {
'n_estimators': [50, 100, 150],
'max_depth': [3, 5, None],
'min_samples_split': [2, 5]
}
# Step 4: Grid Search with 5-Fold Cross Validation
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=5, scoring='accuracy')
# Step 5: Fit Grid Search
grid_search.fit(X, y)
# Step 6: Best Parameters & Score
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Cross-Validated Accuracy:", grid_search.best_score_)
Output:
Accuracy for each fold: [1. 0.96666667 0.93333333 1. 0.93333333]
Average Accuracy: 0.9666666666666668