Regularization Techniques

  • This lesson introduces regularization techniques used to prevent overfitting and improve neural network generalization.
  • L1 & L2 Regularization

    Regularization works by adding a penalty term to the loss function.

    New Loss:

    Loss=Original Loss+Regularization TermLoss = Original\ Loss + Regularization\ TermLoss=Original Loss+Regularization Term

    L1 Regularization (Lasso)

    Adds absolute value of weights:

    Loss=L+λ∑∣W∣Loss = L + \lambda \sum |W|Loss=L+λ∑∣W∣

    Effect:

    • Forces some weights to become exactly 0

    • Performs feature selection

    • Creates sparse model

    Useful when:

    • Many irrelevant features

    • Want simpler model

    L2 Regularization (Ridge)

    Adds squared weights:

    Loss=L+λ∑W2Loss = L + \lambda \sum W^2Loss=L+λ∑W2

    Effect:

    • Reduces weight magnitude

    • Does not make weights exactly zero

    • Smooth model

    Most commonly used


    L1 vs L2

    Feature

    L1

    L2

    Makes weights 0

    ✅ Yes

    ❌ No

    Sparse model

    Stability

    Less stable

    More stable

    Common usage

    Feature selection

    Deep Learning


    Dropout

    Dropout randomly turns off some neurons during training.

    Example:
    Dropout rate = 0.5
    → 50% neurons ignored randomly per batch

    Why it works?

    Prevents neurons from:

    • Becoming dependent on each other

    • Memorizing training data

    Forces network to learn robust features.

    During Testing:

    All neurons are used (no dropout).

    Code Example

Neural Network with Dropout Example in Python using TensorFlow Keras

This Python example demonstrates how to add Dropout regularization to a neural network using TensorFlow Keras. The model consists of a Dense hidden layer with ReLU activation, a Dropout layer to prevent overfitting by randomly deactivating 50% of neurons during training, and an output layer with softmax activation for multi-class classification.

from tensorflow.keras import layers

model = tf.keras.Sequential([
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(10, activation='softmax')
])
  • Early Stopping

    Instead of training for fixed epochs, stop training when:

    Validation loss starts increasing.

    Why?

    When:

    • Training loss ↓

    • Validation loss ↑

    It means overfitting has started.

    Code Example

Using Early Stopping in TensorFlow Keras to Prevent Overfitting

This Python example demonstrates how to implement Early Stopping during neural network training using TensorFlow Keras. The EarlyStopping callback monitors the validation loss and stops training if it does not improve for a specified number of epochs (patience=3). This helps prevent overfitting and saves training time by stopping the model once it stops learning on validation data.

from tensorflow.keras.callbacks import EarlyStopping

early_stop = EarlyStopping(
    monitor='val_loss',
    patience=3
)

model.fit(X_train, y_train,
          validation_data=(X_val, y_val),
          callbacks=[early_stop])
  • Batch Normalization

    Batch Normalization normalizes layer inputs.

    Xnormalized=X−μσX_{normalized} = \frac{X - \mu}{\sigma}Xnormalized​=σX−μ​

    Benefits:

    Faster training
    More stable gradients
    Allows higher learning rate
    Reduces internal covariate shift

    Code Example

Using Batch Normalization in TensorFlow Keras Neural Networks

This Python snippet demonstrates how to apply Batch Normalization in a neural network using TensorFlow Keras. A Dense layer is followed by BatchNormalization and a separate ReLU activation layer. Batch Normalization normalizes the outputs of the previous layer, which helps stabilize and accelerate training while improving model performance.

layers.Dense(128),
layers.BatchNormalization(),
layers.Activation('relu')