Variability & Data Spread

  • Learn how to measure variability and spread in datasets using statistical methods.
  • What is Variability?

    Variability refers to how much data points differ from each other or from the average (mean).

    If values are very close to the mean → Low variability
    If values are far from the mean → High variability

    Example:

    Dataset A:
    50, 52, 49, 51, 50

    Dataset B:
    10, 90, 30, 70, 50

    Both may have similar mean, but Dataset B has much higher spread.



    Variance

    Variance measures the average squared deviation from the mean.

    Formula:

    Variance=∑(X−Xˉ)2NVariance = \frac{\sum (X - \bar{X})^2}{N}Variance=N∑(X−Xˉ)2​

    Where:

    • XXX = data value

    • Xˉ\bar{X}Xˉ = mean

    • NNN = total values


    Example:

    Data: 10, 20, 30

    Mean = 20

    Step 1: Find deviations
    (10−20)² = 100
    (20−20)² = 0
    (30−20)² = 100

    Step 2: Average of squared deviations
    Variance = (100 + 0 + 100) / 3 = 66.67


    Example:

Calculating Variance using NumPy

This code calculates and prints the variance of the given dataset using NumPy’s var() function, which measures how much the values spread out from the mean.

import numpy as np


data = [10, 20, 30]

print("Variance:", np.var(data))
  • Standard Deviation

    Standard Deviation (SD) is the square root of variance.

    SD=VarianceSD = \sqrt{Variance}SD=Variance​

    👉 It tells how much data deviates from the mean in original units.

    From previous example:
    Variance = 66.67

    SD = √66.67 ≈ 8.16


    Example:

Calculating Standard Deviation using NumPy

This code calculates and prints the standard deviation of the dataset using NumPy’s std() function, showing how much the values deviate from the mean on average.

import numpy as np

data = [10, 20, 30]
print("Standard Deviation:", np.std(data))
  • Practical Example (Student Marks)

    Marks:
    60, 65, 70, 75, 80

    Mean = 70

    If SD is small → Marks are consistent
    If SD is large → Marks vary widely


    Interpreting Data Spread

    Standard Deviation

    Meaning

    Low SD

    Data points close to mean

    High SD

    Data points widely spread

    SD = 0

    All values same


    Real-World Meaning

    Education:
    Low SD → Students perform similarly
    High SD → Some very high & very low scorers

    Business:
    Low SD → Stable sales
    High SD → Fluctuating sales

    Finance:
    Higher SD → Higher risk