❮ Previous Next ❯

Loss Functions

This lesson explains loss functions and how they measure prediction errors in neural network models.

Loss vs Cost Function
Many people use these terms interchangeably, but technically:
Term
Meaning
Loss Function
Error for a single training example
Cost Function
Average loss over entire dataset
Cost=1N∑LossiCost = \frac{1}{N} \sum Loss_iCost=N1∑Lossi
During training, we usually minimize the cost function.

Mean Squared Error (MSE)
Formula:
MSE=1N∑(ytrue−ypred)2MSE = \frac{1}{N} \sum (y_{true} - y_{pred})^2MSE=N1∑(ytrue−ypred)2
Use Case:
Regression problems
Why square?
- Makes error positive
- Penalizes large errors more
Example:
Actual
Predicted
5
4
3
6
Error:
(5−4)2+(3−6)2=1+9=10(5-4)^2 + (3-6)^2 = 1 + 9 = 10(5−4)2+(3−6)2=1+9=10
MSE:
10/2=510 / 2 = 510/2=5
Advantages:
Simple
Differentiable
Disadvantages:
Sensitive to outliers

Binary Cross Entropy (Log Loss)
Used for binary classification.
Formula:
Loss=−[ylog⁡(p)+(1−y)log⁡(1−p)]Loss = -[y \log(p) + (1-y)\log(1-p)]Loss=−[ylog(p)+(1−y)log(1−p)]
Where:
- yyy = true label (0 or 1)
- ppp = predicted probability
Why not MSE for classification?
Because:
- BCE gives stronger gradients
- Faster convergence
- Better probability interpretation
Example:
If:
- True label = 1
- Predicted probability = 0.9
Loss=−log⁡(0.9)=0.105Loss = -\log(0.9) = 0.105Loss=−log(0.9)=0.105
If predicted = 0.1:
Loss=−log⁡(0.1)=2.30Loss = -\log(0.1) = 2.30Loss=−log(0.1)=2.30
Wrong confident predictions are penalized heavily.

Categorical Cross Entropy
Used for multi-class classification.
Formula:
Loss=−∑yilog⁡(pi)Loss = - \sum y_i \log(p_i)Loss=−∑yilog(pi)
Where:
- yiy_iyi = true class (one-hot encoded)
- pip_ipi = predicted probability
Example:
True class: [0, 1, 0]
Predicted: [0.1, 0.8, 0.1]
Loss=−log⁡(0.8)Loss = -\log(0.8)Loss=−log(0.8)
Used With:
- Softmax activation in output layer
Comparison Table
Loss Function
Use Case
Output Layer Activation
MSE
Regression
Linear
Binary Cross Entropy
Binary Classification
Sigmoid
Categorical Cross Entropy
Multi-class Classification
Softmax
Key Understanding
- Regression → MSE
- Binary classification → Binary Cross Entropy
- Multi-class classification → Categorical Cross Entropy

❮ Previous Next ❯

Term	Meaning
Loss Function	Error for a single training example
Cost Function	Average loss over entire dataset

Loss Function	Use Case	Output Layer Activation
MSE	Regression	Linear
Binary Cross Entropy	Binary Classification	Sigmoid
Categorical Cross Entropy	Multi-class Classification	Softmax

Actual	Predicted
5	4
3	6

Loss vs Cost Function

Mean Squared Error (MSE)

Formula:

Use Case:

Why square?

Example:

Advantages:

Disadvantages:

Binary Cross Entropy (Log Loss)

Formula:

Why not MSE for classification?

Example:

Categorical Cross Entropy

Formula:

Example:

Used With:

Comparison Table

Key Understanding

Login

Create Account