Loss Functions

  • This lesson explains loss functions and how they measure prediction errors in neural network models.
  • Loss vs Cost Function

    Many people use these terms interchangeably, but technically:

    Term

    Meaning

    Loss Function

    Error for a single training example

    Cost Function

    Average loss over entire dataset

    Cost=1N∑LossiCost = \frac{1}{N} \sum Loss_iCost=N1​∑Lossi​

    During training, we usually minimize the cost function.


    Mean Squared Error (MSE)

    Formula:

    MSE=1N∑(ytrue−ypred)2MSE = \frac{1}{N} \sum (y_{true} - y_{pred})^2MSE=N1​∑(ytrue​−ypred​)2

    Use Case:

    Regression problems

    Why square?

    • Makes error positive

    • Penalizes large errors more

    Example:

    Actual

    Predicted

    5

    4

    3

    6

    Error:

    (5−4)2+(3−6)2=1+9=10(5-4)^2 + (3-6)^2 = 1 + 9 = 10(5−4)2+(3−6)2=1+9=10

    MSE:

    10/2=510 / 2 = 510/2=5

    Advantages:

    Simple
    Differentiable

    Disadvantages:

    Sensitive to outliers


    Binary Cross Entropy (Log Loss)

    Used for binary classification.

    Formula:

    Loss=−[ylog⁡(p)+(1−y)log⁡(1−p)]Loss = -[y \log(p) + (1-y)\log(1-p)]Loss=−[ylog(p)+(1−y)log(1−p)]

    Where:

    • yyy = true label (0 or 1)

    • ppp = predicted probability

    Why not MSE for classification?

    Because:

    • BCE gives stronger gradients

    • Faster convergence

    • Better probability interpretation

    Example:

    If:

    • True label = 1

    • Predicted probability = 0.9

    Loss=−log⁡(0.9)=0.105Loss = -\log(0.9) = 0.105Loss=−log(0.9)=0.105

    If predicted = 0.1:

    Loss=−log⁡(0.1)=2.30Loss = -\log(0.1) = 2.30Loss=−log(0.1)=2.30

    Wrong confident predictions are penalized heavily.


    Categorical Cross Entropy

    Used for multi-class classification.

    Formula:

    Loss=−∑yilog⁡(pi)Loss = - \sum y_i \log(p_i)Loss=−∑yi​log(pi​)

    Where:

    • yiy_iyi​ = true class (one-hot encoded)

    • pip_ipi​ = predicted probability

    Example:

    True class: [0, 1, 0]
    Predicted: [0.1, 0.8, 0.1]

    Loss=−log⁡(0.8)Loss = -\log(0.8)Loss=−log(0.8)

    Used With:

    • Softmax activation in output layer

    Comparison Table

    Loss Function

    Use Case

    Output Layer Activation

    MSE

    Regression

    Linear

    Binary Cross Entropy

    Binary Classification

    Sigmoid

    Categorical Cross Entropy

    Multi-class Classification

    Softmax

    Key Understanding

    • Regression → MSE

    • Binary classification → Binary Cross Entropy

    • Multi-class classification → Categorical Cross Entropy