Ridge and Lasso Regression

  • This module explains Ridge and Lasso Regression techniques used to reduce overfitting in machine learning models through L1 and L2 regularization and understanding the bias-variance tradeoff.
  • Overfitting Problem

    What is Overfitting?

    Overfitting happens when a model:

    • Performs very well on training data

    • Performs poorly on testing/new data

    It memorizes the data instead of learning general patterns.

    Example

    Suppose we are predicting house prices.

    If the model:

    • Uses too many features

    • Fits noise in data

    • Creates a very complex curve

    Then it will perfectly fit training data but fail on new houses.

    Signs of Overfitting

    • Training accuracy = High

    • Testing accuracy = Low

    • Model too complex


    Regularization Concept

    Regularization is a technique used to:

    • Reduce model complexity

    • Prevent overfitting

    • Penalize large coefficients

    Idea:

    Add a penalty term to the cost function.

    Original Cost Function (MSE):

    MSE=1n∑(y−y^)2MSE = \frac{1}{n} \sum (y - \hat{y})^2MSE=n1​∑(y−y^​)2

    Regularized Cost Function:

    Loss=MSE+PenaltyLoss = MSE + PenaltyLoss=MSE+Penalty

    This penalty shrinks coefficient values.


    L1 Regularization (Lasso Regression)

    Full Form:

    Least Absolute Shrinkage and Selection Operator

    Formula:

    Loss=MSE+λ∑∣w∣Loss = MSE + \lambda \sum |w|Loss=MSE+λ∑∣w∣

    • λ\lambdaλ = regularization parameter

    • www = coefficients

    Key Feature:

    • Shrinks coefficients

    • Can make some coefficients exactly 0

    • Performs feature selection

    When to Use?

    • When many irrelevant features exist

    • When you want automatic feature selection

    Python Example

Lasso Regression Model Training and Evaluation in Python

This code demonstrates how to train a Lasso Regression model using Python. It generates sample data, splits it into training and testing sets, fits the Lasso model, and prints the model coefficients along with training and testing scores to evaluate performance.

from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
import numpy as np

# Sample Data
X = np.random.rand(100, 5)
y = X @ np.array([5, 0, 3, 0, 2]) + np.random.randn(100)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = Lasso(alpha=0.1)
model.fit(X_train, y_train)

print("Coefficients:", model.coef_)
print("Training Score:", model.score(X_train, y_train))
print("Testing Score:", model.score(X_test, y_test))
  • Output:

    Coefficients: [3.60521106 0.         1.07966069 0.         0.1975825 ]

    Training Score: 0.5994195947062018

    Testing Score: 0.4909136833444431


    You will notice some coefficients become 0.


    L2 Regularization (Ridge Regression)

    Formula:

    Loss=MSE+λ∑w2Loss = MSE + \lambda \sum w^2Loss=MSE+λ∑w2

    Key Feature:

    • Shrinks coefficients

    • Does NOT make them zero

    • Reduces impact of less important features

    When to Use?

    • When all features are important

    • When multicollinearity exists

    Python Example

Ridge Regression Model Training and Evaluation in Python

This code shows how to train a Ridge Regression model using Python. The model is fitted on training data and then used to calculate coefficients and evaluate performance using training and testing scores. Ridge Regression helps reduce overfitting by applying L2 regularization to the model.

from sklearn.linear_model import Ridge

model = Ridge(alpha=0.1)
model.fit(X_train, y_train)

print("Coefficients:", model.coef_)
print("Training Score:", model.score(X_train, y_train))
print("Testing Score:", model.score(X_test, y_test))
  • Output:

    Coefficients: [4.61903088 0.26909111 2.27121045 0.51240214 1.45879947]

    Training Score: 0.708650666355346

    Testing Score: 0.5313172051848971


    Bias-Variance Tradeoff

    Understanding Ridge and Lasso requires understanding Bias & Variance.

    Bias

    Error due to overly simple model.

    • Underfitting problem

    • High bias → Model too simple

    Example: Using straight line for complex curved data.

    Variance

    Error due to overly complex model.

    • Overfitting problem

    • High variance → Model too complex

    Tradeoff

    Model Type

    Bias

    Variance

    Simple Model

    High

    Low

    Complex Model

    Low

    High

    Regularized Model

    Balanced

    Balanced

    Goal:

    Find balance between bias and variance.

    Regularization:

    • Slightly increases bias

    • Reduces variance

    • Improves generalization