Hyperparameter Tuning

  • This lesson explains hyperparameter tuning methods used to optimize machine learning and deep learning models.
  • Learning Rate Scheduling

    Learning Rate Scheduling = Changing learning rate during training.

    Instead of using fixed learning rate, we adjust it over time.

    Why?

    If learning rate:

    • Too high → unstable training

    • Too low → slow convergence

    Scheduling helps:
    Faster convergence
    Better final accuracy

    Types of Learning Rate Scheduling

    Step Decay

    Reduce learning rate after fixed epochs.

    Example:
    0.01 → 0.001 → 0.0001

    Exponential Decay

    ηt=η0e−kt\eta_t = \eta_0 e^{-kt}ηt​=η0​e−kt

    Learning rate decreases smoothly.

    Reduce on Plateau

    Reduce learning rate when validation loss stops improving.

    Cosine Annealing

    Learning rate decreases following cosine curve.

    Code Example (TensorFlow)

Using ReduceLROnPlateau in TensorFlow Keras to Adjust Learning Rate

This Python example demonstrates how to implement ReduceLROnPlateau in TensorFlow Keras to dynamically adjust the learning rate during training. The callback monitors the validation loss and reduces the learning rate by a factor (here 0.5) if the loss does not improve for a specified number of epochs (patience=2). This technique helps improve convergence and can prevent the model from getting stuck in local minima.

from tensorflow.keras.callbacks import ReduceLROnPlateau

lr_scheduler = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,
    patience=2
)

model.fit(X_train, y_train,
          validation_data=(X_val, y_val),
          callbacks=[lr_scheduler])
  • Grid Search

    Grid Search tries all possible combinations of hyperparameters.

    Example:

    Learning Rate

    Batch Size

    0.01

    32

    0.01

    64

    0.001

    32

    0.001

    64

    It tests all combinations.

    Advantages:

    Finds best combination (exhaustive)

    Disadvantages:

    Very slow
    Expensive for deep learning


    Random Search

    Instead of trying all combinations, it selects random combinations.

    Example:
    Randomly try 20 configurations out of 100 possible.

    Advantages:

    Faster
    Often finds good solution
    More efficient than Grid Search

    Research shows Random Search can be more effective than Grid Search when only few hyperparameters are important.


    Model Selection

    Model Selection = Choosing best model based on validation performance.

    Steps:

    Split dataset:

    • Train

    • Validation

    • Test

    Train multiple models

    Compare metrics:

    • Accuracy

    • Precision

    • Recall

    • F1-score

    Choose model with best validation performance

    Cross-Validation

    Instead of one validation split, use k-fold cross-validation.

    Better evaluation but expensive for deep learning.