Loss Functions
- This lesson explains loss functions and how they measure prediction errors in neural network models.
Loss vs Cost Function
Many people use these terms interchangeably, but technically:
Cost=1N∑LossiCost = \frac{1}{N} \sum Loss_iCost=N1∑Lossi
During training, we usually minimize the cost function.
Mean Squared Error (MSE)
Formula:
MSE=1N∑(ytrue−ypred)2MSE = \frac{1}{N} \sum (y_{true} - y_{pred})^2MSE=N1∑(ytrue−ypred)2
Use Case:
Regression problems
Why square?
Makes error positive
Penalizes large errors more
Example:
Error:
(5−4)2+(3−6)2=1+9=10(5-4)^2 + (3-6)^2 = 1 + 9 = 10(5−4)2+(3−6)2=1+9=10
MSE:
10/2=510 / 2 = 510/2=5
Advantages:
Simple
DifferentiableDisadvantages:
Sensitive to outliers
Binary Cross Entropy (Log Loss)
Used for binary classification.
Formula:
Loss=−[ylog(p)+(1−y)log(1−p)]Loss = -[y \log(p) + (1-y)\log(1-p)]Loss=−[ylog(p)+(1−y)log(1−p)]
Where:
yyy = true label (0 or 1)
ppp = predicted probability
Why not MSE for classification?
Because:
BCE gives stronger gradients
Faster convergence
Better probability interpretation
Example:
If:
True label = 1
Predicted probability = 0.9
Loss=−log(0.9)=0.105Loss = -\log(0.9) = 0.105Loss=−log(0.9)=0.105
If predicted = 0.1:
Loss=−log(0.1)=2.30Loss = -\log(0.1) = 2.30Loss=−log(0.1)=2.30
Wrong confident predictions are penalized heavily.
Categorical Cross Entropy
Used for multi-class classification.
Formula:
Loss=−∑yilog(pi)Loss = - \sum y_i \log(p_i)Loss=−∑yilog(pi)
Where:
yiy_iyi = true class (one-hot encoded)
pip_ipi = predicted probability
Example:
True class: [0, 1, 0]
Predicted: [0.1, 0.8, 0.1]Loss=−log(0.8)Loss = -\log(0.8)Loss=−log(0.8)
Used With:
Softmax activation in output layer
Comparison Table
Key Understanding
Regression → MSE
Binary classification → Binary Cross Entropy
Multi-class classification → Categorical Cross Entropy