❮ Previous

Next ❯

Linear Regression

This module explains Linear Regression in Machine Learning. You will learn simple and multiple linear regression, model assumptions, the cost function (MSE), and how gradient descent is used to optimize the model.

Simple Linear Regression (SLR)
Simple Linear Regression is used when we predict one dependent variable (Y) using one independent variable (X).
Equation
Y=mX+cY = mX + cY=mX+c
- m = slope (how much Y changes when X increases by 1 unit)
- c = intercept (value of Y when X = 0)
Example
Predicting house price based on area.
Example:

Simple Linear Regression – Area vs Price Prediction

This code demonstrates how to implement Simple Linear Regression using Python. It creates a small dataset of house areas and prices, trains a linear regression model using sklearn, predicts prices, prints the slope and intercept of the regression equation, and visualizes the result with a scatter plot and regression line using matplotlib.

# Step 1: Import Libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Step 2: Create Dataset (Area vs Price)
# Area (independent variable - X)
X = np.array([500, 800, 1000, 1200, 1500]).reshape(-1, 1)

# Price (dependent variable - Y)
y = np.array([100000, 150000, 200000, 230000, 300000])

# Step 3: Create Model
model = LinearRegression()

# Step 4: Train Model
model.fit(X, y)

# Step 5: Predict
predicted_price = model.predict(X)

# Step 6: Print Equation Values
print("Slope (m):", model.coef_[0])
print("Intercept (c):", model.intercept_)

# Step 7: Plot Graph
plt.scatter(X, y, color='blue')  # actual data
plt.plot(X, predicted_price, color='red')  # regression line
plt.xlabel("Area")
plt.ylabel("Price")
plt.title("Simple Linear Regression")
plt.show()

Output:
Slope (m): 199.99999999999991
Intercept (c): -3999.9999999999127

Multiple Linear Regression (MLR)
Multiple Linear Regression is used when we predict one dependent variable (Y) using multiple independent variables (X1, X2, X3...).
Equation
Y=b0+b1X1+b2X2+b3X3Y = b_0 + b_1X_1 + b_2X_2 + b_3X_3Y=b0+b1X1+b2X2+b3X3
- b0b_0b0 = intercept
- b1,b2,b3b_1, b_2, b_3b1,b2,b3 = coefficients
Example
Predicting house price based on:
- Area
- Bedrooms
- Age of house
Example:

Multiple Linear Regression – House Price Prediction

This code demonstrates Multiple Linear Regression using Python. It trains a model with multiple independent variables (Area, Bedrooms, and Age) to predict house prices. After training the model using sklearn, it predicts the price of a new house and prints the coefficients (feature impact) and intercept of the regression equation.

import numpy as np
from sklearn.linear_model import LinearRegression

# Independent variables: [Area, Bedrooms, Age]
X = np.array([
    [1000, 2, 5],
    [1500, 3, 3],
    [2000, 4, 2],
    [2500, 4, 1]
])

# Dependent variable: Price
y = np.array([200000, 300000, 400000, 500000])

# Create model
model = LinearRegression()

# Train model
model.fit(X, y)

# Predict price for new house
new_house = [[1800, 3, 2]]
predicted_price = model.predict(new_house)

print("Predicted Price:", predicted_price[0])
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)

Predicted Price: 359999.99999999994
Coefficients: [2.00000000e+02 1.45519783e-11 5.92354806e-11]
Intercept: -4.0745362639427185e-10

Assumptions of Linear Regression
Linear Regression works best when these assumptions are satisfied:
1. Linearity
The relationship between X and Y must be linear.
2. Independence
Observations should be independent of each other.
3. Homoscedasticity
Error variance should remain constant across all levels of X.
4. Normality
Errors should be normally distributed.
5. No Multicollinearity (for MLR)
Independent variables should not be highly correlated with each other.

Cost Function (MSE – Mean Squared Error)
The goal of the model is to minimize the error.
Formula
MSE=1n∑(yi−y^i)2MSE = \frac{1}{n} \sum (y_i - \hat{y}_i)^2MSE=n1∑(yi−y^i)2
- yiy_iyi = actual value
- y^i\hat{y}_iy^i = predicted value
Example
If:
Actual value = 80
Predicted value = 75
Error = 80 − 75 = 5
Squared error = 25
If we have multiple values:
Example:

Mean Squared Error (MSE) – Model Error Calculation

This code calculates Mean Squared Error (MSE) to measure how much the predicted values differ from the actual values.

import numpy as np

actual = np.array([80, 70, 60])
predicted = np.array([75, 72, 58])

mse = np.mean((actual - predicted) ** 2)
print("MSE:", mse)

Output:
MSE: 11.0
Why Squared?
- Removes negative values
- Penalizes large errors more
Gradient Descent
Gradient Descent is an optimization algorithm used to minimize the cost function.
Working Steps
1. Start with random slope and intercept
2. Calculate error
3. Update slope and intercept
4. Repeat until error becomes minimum
Update Formula
m=m−α∂∂mm = m - \alpha \frac{\partial}{\partial m}m=m−α∂m∂
- α (alpha) = learning rate
Learning Rate
- Too high → may overshoot minimum
- Too low → training becomes slow
Simple Gradient Descent Example (From Scratch)

Simple Linear Regression – Model Output Explanation

The output shows the values of the regression equation: Slope (m): 1.9896 This means for every 1 unit increase in area, the price increases by approximately 1.99 units. Intercept (c): 0.0304 This is the starting value of price when the area is 0.

import numpy as np
# Data
X = np.array([1, 2, 3, 4])
y = np.array([2, 4, 6, 8])

# Initialize parameters
m = 0
c = 0
learning_rate = 0.01
n = len(X)

# Gradient Descent
for i in range(1000):
    y_pred = m * X + c
    
    dm = (-2/n) * sum(X * (y - y_pred))
    dc = (-2/n) * sum(y - y_pred)
    
    m = m - learning_rate * dm
    c = c - learning_rate * dc

print("Slope (m):", m)
print("Intercept (c):", c)

Output:
Slope (m): 1.9896587550255742
Intercept (c): 0.030404521305361965

❮ Previous

Next ❯

Simple Linear Regression (SLR)

Simple Linear Regression is used when we predict one dependent variable (Y) using one independent variable (X).

Equation

Y=mX+cY = mX + cY=mX+c

m = slope (how much Y changes when X increases by 1 unit)

c = intercept (value of Y when X = 0)

Example

Predicting house price based on area.

Example:

Simple Linear Regression – Area vs Price Prediction

Multiple Linear Regression (MLR)

Multiple Linear Regression is used when we predict one dependent variable (Y) using multiple independent variables (X1, X2, X3...).

Equation

Example

Example:

Multiple Linear Regression – House Price Prediction

Assumptions of Linear Regression

Linear Regression works best when these assumptions are satisfied:

1. Linearity

The relationship between X and Y must be linear.

2. Independence

Observations should be independent of each other.

3. Homoscedasticity

Error variance should remain constant across all levels of X.

4. Normality

Errors should be normally distributed.

5. No Multicollinearity (for MLR)

Independent variables should not be highly correlated with each other.

Cost Function (MSE – Mean Squared Error)

The goal of the model is to minimize the error.

Formula

MSE=1n∑(yi−y^i)2MSE = \frac{1}{n} \sum (y_i - \hat{y}_i)^2MSE=n1​∑(yi​−y^​i​)2

Example

If:

Actual value = 80 Predicted value = 75

Error = 80 − 75 = 5 Squared error = 25

If we have multiple values:

Example:

Mean Squared Error (MSE) – Model Error Calculation

Why Squared?

Gradient Descent

Gradient Descent is an optimization algorithm used to minimize the cost function.

Working Steps

Update Formula

m=m−α∂∂mm = m - \alpha \frac{\partial}{\partial m}m=m−α∂m∂​

α (alpha) = learning rate

Simple Gradient Descent Example (From Scratch)

Simple Linear Regression – Model Output Explanation

Login

Create Account

MSE=1n∑(yi−y^i)2MSE = \frac{1}{n} \sum (y_i - \hat{y}_i)^2MSE=n1∑(yi−y^i)2

Actual value = 80
Predicted value = 75

Error = 80 − 75 = 5
Squared error = 25

m=m−α∂∂mm = m - \alpha \frac{\partial}{\partial m}m=m−α∂m∂