Variability & Data Spread
-
Learn how to measure variability and spread in datasets using statistical methods.
What is Variability?
Variability refers to how much data points differ from each other or from the average (mean).
If values are very close to the mean → Low variability
If values are far from the mean → High variabilityExample:
Dataset A:
50, 52, 49, 51, 50Dataset B:
10, 90, 30, 70, 50Both may have similar mean, but Dataset B has much higher spread.
Variance
Variance measures the average squared deviation from the mean.
Formula:
Variance=∑(X−Xˉ)2NVariance = \frac{\sum (X - \bar{X})^2}{N}Variance=N∑(X−Xˉ)2
Where:
XXX = data value
Xˉ\bar{X}Xˉ = mean
NNN = total values
Example:
Data: 10, 20, 30
Mean = 20
Step 1: Find deviations
(10−20)² = 100
(20−20)² = 0
(30−20)² = 100Step 2: Average of squared deviations
Variance = (100 + 0 + 100) / 3 = 66.67Example:
Calculating Variance using NumPy
This code calculates and prints the variance of the given dataset using NumPy’s var() function, which measures how much the values spread out from the mean.
import numpy as np
data = [10, 20, 30]
print("Variance:", np.var(data))
Standard Deviation
Standard Deviation (SD) is the square root of variance.
SD=VarianceSD = \sqrt{Variance}SD=Variance
👉 It tells how much data deviates from the mean in original units.
From previous example:
Variance = 66.67SD = √66.67 ≈ 8.16
Example:
Calculating Standard Deviation using NumPy
This code calculates and prints the standard deviation of the dataset using NumPy’s std() function, showing how much the values deviate from the mean on average.
import numpy as np
data = [10, 20, 30]
print("Standard Deviation:", np.std(data))
Practical Example (Student Marks)
Marks:
60, 65, 70, 75, 80Mean = 70
If SD is small → Marks are consistent
If SD is large → Marks vary widelyInterpreting Data Spread
Real-World Meaning
Education:
Low SD → Students perform similarly
High SD → Some very high & very low scorersBusiness:
Low SD → Stable sales
High SD → Fluctuating salesFinance:
Higher SD → Higher risk