Next

Linear Regression

  • This module explains Linear Regression in Machine Learning. You will learn simple and multiple linear regression, model assumptions, the cost function (MSE), and how gradient descent is used to optimize the model.
  • Simple Linear Regression (SLR)

    Simple Linear Regression is used when we predict one dependent variable (Y) using one independent variable (X).

    Equation

    Y=mX+cY = mX + cY=mX+c

    • m = slope (how much Y changes when X increases by 1 unit)

    • c = intercept (value of Y when X = 0)

    Example

    Predicting house price based on area.

    Example:

Simple Linear Regression – Area vs Price Prediction

This code demonstrates how to implement Simple Linear Regression using Python. It creates a small dataset of house areas and prices, trains a linear regression model using sklearn, predicts prices, prints the slope and intercept of the regression equation, and visualizes the result with a scatter plot and regression line using matplotlib.

# Step 1: Import Libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Step 2: Create Dataset (Area vs Price)
# Area (independent variable - X)
X = np.array([500, 800, 1000, 1200, 1500]).reshape(-1, 1)

# Price (dependent variable - Y)
y = np.array([100000, 150000, 200000, 230000, 300000])

# Step 3: Create Model
model = LinearRegression()

# Step 4: Train Model
model.fit(X, y)

# Step 5: Predict
predicted_price = model.predict(X)

# Step 6: Print Equation Values
print("Slope (m):", model.coef_[0])
print("Intercept (c):", model.intercept_)

# Step 7: Plot Graph
plt.scatter(X, y, color='blue')  # actual data
plt.plot(X, predicted_price, color='red')  # regression line
plt.xlabel("Area")
plt.ylabel("Price")
plt.title("Simple Linear Regression")
plt.show()
  • Output:

    Slope (m): 199.99999999999991

    Intercept (c): -3999.9999999999127

Lesson image
  • Multiple Linear Regression (MLR)

    Multiple Linear Regression is used when we predict one dependent variable (Y) using multiple independent variables (X1, X2, X3...).

    Equation

    Y=b0+b1X1+b2X2+b3X3Y = b_0 + b_1X_1 + b_2X_2 + b_3X_3Y=b0​+b1​X1​+b2​X2​+b3​X3​

    • b0b_0b0​ = intercept

    • b1,b2,b3b_1, b_2, b_3b1​,b2​,b3​ = coefficients

    Example

    Predicting house price based on:

    • Area

    • Bedrooms

    • Age of house

    Example:

Multiple Linear Regression – House Price Prediction

This code demonstrates Multiple Linear Regression using Python. It trains a model with multiple independent variables (Area, Bedrooms, and Age) to predict house prices. After training the model using sklearn, it predicts the price of a new house and prints the coefficients (feature impact) and intercept of the regression equation.

import numpy as np
from sklearn.linear_model import LinearRegression

# Independent variables: [Area, Bedrooms, Age]
X = np.array([
    [1000, 2, 5],
    [1500, 3, 3],
    [2000, 4, 2],
    [2500, 4, 1]
])

# Dependent variable: Price
y = np.array([200000, 300000, 400000, 500000])

# Create model
model = LinearRegression()

# Train model
model.fit(X, y)

# Predict price for new house
new_house = [[1800, 3, 2]]
predicted_price = model.predict(new_house)

print("Predicted Price:", predicted_price[0])
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)
  • Predicted Price: 359999.99999999994

    Coefficients: [2.00000000e+02 1.45519783e-11 5.92354806e-11]

    Intercept: -4.0745362639427185e-10

  • Assumptions of Linear Regression

    Linear Regression works best when these assumptions are satisfied:

    1. Linearity

    The relationship between X and Y must be linear.

    2. Independence

    Observations should be independent of each other.

    3. Homoscedasticity

    Error variance should remain constant across all levels of X.

    4. Normality

    Errors should be normally distributed.

    5. No Multicollinearity (for MLR)

    Independent variables should not be highly correlated with each other.


    Cost Function (MSE – Mean Squared Error)

    The goal of the model is to minimize the error.

    Formula

    MSE=1n∑(yi−y^i)2MSE = \frac{1}{n} \sum (y_i - \hat{y}_i)^2MSE=n1​∑(yi​−y^​i​)2

    • yiy_iyi​ = actual value
    • y^i\hat{y}_iy^​i​ = predicted value

    Example

    If:

    Actual value = 80
    Predicted value = 75

    Error = 80 − 75 = 5
    Squared error = 25

    If we have multiple values:

    Example:

Mean Squared Error (MSE) – Model Error Calculation

This code calculates Mean Squared Error (MSE) to measure how much the predicted values differ from the actual values.

import numpy as np

actual = np.array([80, 70, 60])
predicted = np.array([75, 72, 58])

mse = np.mean((actual - predicted) ** 2)
print("MSE:", mse)
  • Output:

    MSE: 11.0

    Why Squared?

    • Removes negative values

    • Penalizes large errors more

    Gradient Descent

    Gradient Descent is an optimization algorithm used to minimize the cost function.

    Working Steps

    1. Start with random slope and intercept
    2. Calculate error
    3. Update slope and intercept
    4. Repeat until error becomes minimum

    Update Formula

    m=m−α∂∂mm = m - \alpha \frac{\partial}{\partial m}m=m−α∂m∂​

    • α (alpha) = learning rate

      Learning Rate

      • Too high → may overshoot minimum

        • Too low → training becomes slow

        Simple Gradient Descent Example (From Scratch)

      Simple Linear Regression – Model Output Explanation

      The output shows the values of the regression equation: Slope (m): 1.9896 This means for every 1 unit increase in area, the price increases by approximately 1.99 units. Intercept (c): 0.0304 This is the starting value of price when the area is 0.

      import numpy as np
      # Data
      X = np.array([1, 2, 3, 4])
      y = np.array([2, 4, 6, 8])
      
      # Initialize parameters
      m = 0
      c = 0
      learning_rate = 0.01
      n = len(X)
      
      # Gradient Descent
      for i in range(1000):
          y_pred = m * X + c
          
          dm = (-2/n) * sum(X * (y - y_pred))
          dc = (-2/n) * sum(y - y_pred)
          
          m = m - learning_rate * dm
          c = c - learning_rate * dc
      
      print("Slope (m):", m)
      print("Intercept (c):", c)
      • Output:

        Slope (m): 1.9896587550255742

        Intercept (c): 0.030404521305361965

      Next