Training Neural Networks
- This lesson explains how neural networks learn from data using forward propagation and backpropagation algorithms.
Gradient Descent
Gradient Descent is an optimization algorithm used to minimize the loss function.
Basic Idea:
Move weights in the direction that reduces error
Use the gradient (derivative) of the loss function
Update Rule:
W=W−η∂L∂WW = W - \eta \frac{\partial L}{\partial W}W=W−η∂W∂L
Where:
WWW = weight
η\etaη = learning rate
LLL = loss function
Intuition:
Imagine standing on a hill:
Gradient tells you slope direction
You move downhill to reach minimum loss
Learning Rate (η)
The learning rate controls how big a step we take during weight update.
If Learning Rate is:
Too small → Very slow training
Too large → Overshoots minimum, unstable
Just right → Fast and stable convergence
Visualization Concept:
Small steps → Slow but stable
Large steps → Jump around minimumChoosing correct learning rate is critical.
Batch vs Mini-Batch vs Stochastic Gradient Descent
Batch Gradient Descent
Computes gradient using full dataset
Computationally expensive for large data
Stochastic Gradient Descent (SGD)
Updates weights for every single example
Faster but noisy updates
Mini-Batch Gradient Descent
Uses small batches
Most practical and widely used
Backpropagation
Backpropagation is the algorithm that:
Calculates gradients
Sends error backward through network
Updates weightsSteps:
Forward Propagation → Compute prediction
Compute Loss
Backward Propagation → Compute gradients
Update Weights using Gradient Descent
Repeat for many epochs
Chain Rule (Core of Backpropagation)
Backpropagation works using the Chain Rule from calculus.
Chain Rule Concept:
If:
L=f(g(x))L = f(g(x))L=f(g(x))
Then:
dLdx=dLdg×dgdx\frac{dL}{dx} = \frac{dL}{dg} \times \frac{dg}{dx}dxdL=dgdL×dxdg
In neural networks:
Loss depends on output
Output depends on hidden layer
Hidden layer depends on weights
So gradients are calculated layer by layer backward.
Full Training Flow
Input → Forward Propagation → Loss Calculation
↓
Backpropagation
↓
Weight Update (Gradient Descent)
↓
Repeat