Descriptive Statistics
-
Learn how to summarize datasets using descriptive statistics.
Descriptive Statistics
Descriptive statistics helps us summarize and understand data.
Before using visualization tools like Seaborn or Matplotlib, we first calculate basic statistical measures.Mean (Average)
The Mean is the sum of all values divided by the total number of values.
Formula:
Mean=∑XNMean = \frac{\sum X}{N}Mean=N∑X
Example:
Student marks:
50, 60, 70, 80, 90Mean=(50+60+70+80+90)/5=70Mean = (50 + 60 + 70 + 80 + 90) / 5 = 70Mean=(50+60+70+80+90)/5=70
The average marks = 70
Example:
Calculating Mean using NumPy
This code calculates and prints the average (mean) of a list of marks using NumPy’s
import numpy as np
marks = [50, 60, 70, 80, 90]
print("Mean:", np.mean(marks))
Median
The Median is the middle value when data is arranged in ascending order.
Example 1 (Odd numbers):
Data:
10, 20, 30, 40, 50Middle value = 30
Example 2 (Even numbers):
Data:
10, 20, 30, 40Middle values = 20 and 30
Median=(20+30)/2=25Median = (20 + 30) / 2 = 25Median=(20+30)/2=25
Example:
Calculating Median using NumPy
This code calculates and prints the median (middle value) of the given dataset using NumPy’s median() function.
import numpy as np
data = [10, 20, 30, 40]
print("Median:", np.median(data))
👉 Median is useful when data has outliers.
Mode
The Mode is the value that appears most frequently.
Example:
Data:
5, 10, 10, 20, 20, 20, 30Mode = 20 (appears 3 times)
Example:
Calculating Mode using SciPy
This code uses scipy.stats.mode() to calculate and display the mode (most frequently occurring value) from the given dataset.
from scipy import stats
data = [5, 10, 10, 20, 20, 20, 30]
print("Mode:", stats.mode(data))
👉 Mode is useful for categorical data.
Practical Example with Real Dataset
Example: Monthly Sales (₹)
Sales data:
20000, 22000, 25000, 22000, 27000, 30000, 22000Interpretation:
Average sales ≈ ₹24000
Middle sales value = ₹22000
Most frequent sales = ₹22000